summaryrefslogtreecommitdiff
path: root/src/mem
AgeCommit message (Collapse)Author
2015-06-25Ruby: Remove assert in RubyPort retry list logicJason Power
Remove the assert when adding a port to the RubyPort retry list. Instead of asserting, just ignore the added port, since it's already on the list. Without this patch, Ruby+detailed fails for even the simplest tests
2015-06-09mem: Add check for express snoop in packet destructorAli Jafri
Snoop packets share the request pointer with the originating packets. We need to ensure that the snoop packet destruction does not delete the request. Snoops are used for reads, invalidations, HardPFReqs, Writebacks and CleansEvicts. Reads, invalidations, and HardPFReqs need a response so their snoops do not delete the request. For Writebacks and CleanEvicts we need to check explicitly for whethere the current packet is an express snoop, in whcih case do not delete the request.
2015-06-09mem: Fix snoop packet data allocation bugAndreas Hansson
This patch fixes an issue where the snoop packet did not properly forward the data pointer in case of static data.
2015-06-07ruby: Fix MESI consistency bugMarco Elver
Fixes missed forward eviction to CPU. With the O3CPU this can lead to load-load reordering, as the LQ is never notified of the invalidate. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-06-07mem: Add HMC Timing ParametersMatthias Jung
A single HMC-2500 x32 model based on: [1] DRAMSpec: a high-level DRAM bank modelling tool developed at the University of Kaiserslautern. This high level tool uses RC (resistance-capacitance) and CV (capacitance-voltage) models to estimate the DRAM bank latency and power numbers. [2] A Logic-base Interconnect for Supporting Near Memory Computation in the Hybrid Memory Cube (E. Azarkhish et. al) Assumed for the HMC model is a 30 nm technology node. The modelled HMC consists of a 4 Gbit part with 4 layers connected with TSVs. Each layer has 16 vaults and each vault consists of 2 banks per layer. In order to be able to use the same controller used for 2D DRAM generations for HMC, the following analogy is done: Channel (DDR) => Vault (HMC) device_size (DDR) => size of a single layer in a vault ranks per channel (DDR) => number of layers banks per rank (DDR) => banks per layer devices per rank (DDR) => devices per layer ( 1 for HMC). The parameters for which no input is available are inherited from the DDR3 configuration.
2015-05-30mem: addr_mapper: restore old address if request not sentChristoph Pfister
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-05-26ruby: Deprecation warning for RubyMemoryControlAndreas Hansson
A step towards removing RubyMemoryControl and shift users to DRAMCtrl. The latter is faster, more representative, very versatile, and is integrated with power models.
2015-05-19ruby: Fix RubySystem warm-up and cool-down scopeJoel Hestness
The processes of warming up and cooling down Ruby caches are simulation-wide processes, not just RubySystem instance-specific processes. Thus, the warm-up and cool-down variables should be globally visible to any Ruby components participating in either process. Make these variables static members and track the warm-up and cool-down processes as appropriate. This patch also has two side benefits: 1) It removes references to the RubySystem g_system_ptr, which are problematic for allowing multiple RubySystem instances in a single simulation. Warmup and cooldown variables being static (global) reduces the need for instance-specific dereferences through the RubySystem. 2) From the AbstractController, it removes local RubySystem pointers, which are used inconsistently with other uses of the RubySystem: 11 other uses reference the RubySystem with the g_system_ptr. Only sequencers have local pointers.
2015-03-17mem: Create a request copy for deferred snoopsStephan Diestelhorst
Sometimes, we need to defer an express snoop in an MSHR, but the original request might complete and deallocate the original pkt->req. In those cases, create a copy of the request so that someone who is inspecting the delayed snoop can also inspect the request still. All of this is rather hacky, but the allocation / linking and general life-time management of Packet and Request is rather tricky. Deleting the copy is another tricky area, testing so far has shown that the right copy is deleted at the right time.
2015-05-05mem, cpu: Add a separate flag for strictly ordered memoryAndreas Sandberg
The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
2015-05-05mem, alpha: Move Alpha-specific request flagsAndreas Sandberg
Move Alpha-specific memory request flags to an architecture-specific header and map them to the architecture specific flag bit range.
2015-05-05mem: Snoop into caches on uncacheable accessesAndreas Hansson
This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent. The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block.
2015-05-05mem: Pass shared downstream through cachesAndreas Hansson
This patch ensures that we pass on information about a packet being shared (rather than exclusive), when forwarding a packet downstream. Without this patch there is a risk that a downstream cache considers the line exclusive when it really isn't.
2015-05-05mem: Add forward snoop check for HardPFReqsAli Jafri
We should always check whether the cache is supposed to be forwarding snoops before generating snoops.
2015-05-05mem: Add missing stats update for uncacheable MSHRsAndreas Hansson
This patch adds a missing counter update for the uncacheable accesses. By updating this counter we also get a meaningful average latency for uncacheable accesses (previously inf).
2015-05-05mem: Tidy up BaseCache parametersAndreas Hansson
This patch simply tidies up the BaseCache parameters and removes the unused "two_queue" parameter.
2015-05-05mem: Remove templates in cache modelDavid Guillen
This patch changes the cache implementation to rely on virtual methods rather than using the replacement policy as a template argument. There is no impact on the simulation performance, and overall the changes make it easier to modify (and subclass) the cache and/or replacement policy.
2015-04-29mem: Simplify page close checks for adaptive policiesRizwana Begum
Both open_adaptive and close_adaptive page polices keep the page open if a row hit is found. If a row hit is not found, close_adaptive page policy precharges the row, and open_adaptive policy precharges the row only if there is a bank conflict request waiting in the queue. This patch makes the checks for above conditions simpler. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-29ruby: set: replace long by unsigned longNilay Vaish
UBSan complains about negative value being shifted
2015-04-13ruby: allow restoring from checkpoint when using DRAMCtrlLena Olson
Restoring from a checkpoint with ruby + the DRAMCtrl memory model was not working, because ruby and DRAMCtrl disagreed on the current tick during warmup. Since there is no reason to do timing requests during warmup, use functional requests instead. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-03-27mem: Support any number of master-IDs in stride prefetcherStephan Diestelhorst
The stride prefetcher had a hardcoded number of contexts (i.e. master-IDs) that it could handle. Since master IDs need to be unique per system, and every core, cache etc. requires a separate master port, a static limit on these does not make much sense. Instead, this patch adds a small hash map that will map all master IDs to the right prefetch state and dynamically allocates new state for new master IDs.
2015-03-27mem: Allocate cache writebacks before new MSHRsAndreas Hansson
This patch changes the order of writeback allocation such that any writebacks resulting from a tag lookup (e.g. for an uncacheable access), are added to the writebuffer before any new MSHR entries are allocated. This ensures that the writebacks logically precedes the new allocations. The patch also changes the uncacheable flush to use proper timed (or atomic) writebacks, as opposed to functional writes.
2015-03-27mem: Cleanup flow for uncacheable accessesAndreas Hansson
This patch simplifies the code dealing with uncacheable timing accesses, aiming to align it with the existing miss handling. Similar to what we do in atomic, a timing request now goes through Cache::access (where the block is also flushed), and then proceeds to ignore any existing MSHR for the block in question. This unifies the flow for cacheable and uncacheable accesses, and for atomic and timing.
2015-03-27mem: Ignore uncacheable MSHRs when finding matchesAndreas Hansson
This patch changes how we search for matching MSHRs, ignoring any MSHR that is allocated for an uncacheable access. By doing so, this patch fixes a corner case in the MSHRs where incorrect data ended up being copied into a (cacheable) read packet due to a first uncacheable MSHR target of size 4, followed by a cacheable target to the same MSHR of size 64. The latter target was filled with nonsense data.
2015-03-27mem: Remove redundant allocateUncachedReadBuffer in cacheAndreas Hansson
This patch removes the no-longer-needed allocateUncachedReadBuffer. Besides the checks it is exactly the same as allocateMissBuffer and thus provides no value.
2015-03-27mem: Modernise MSHR iterators to C++11Andreas Hansson
This patch updates the iterators in the MSHR and MSHR queues to use C++11 range-based for loops. It also does a bit of additional house keeping.
2015-03-27mem: Align all MSHR entries to block boundariesAndreas Hansson
This patch aligns all MSHR queue entries to block boundaries to simplify checks for matches. Previously there were corner cases that could lead to existing entries not being identified as matches. There are, rather alarmingly, a few regressions that change with this patch.
2015-03-27mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHEDAli Jafri
This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more generic BLOCK_CACHED flag. Future patches implementing cache eviction messages can use the BLOCK_CACHED flag in almost the same manner as hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags or the MSHRs in any state, so we are simply replacing calls to setPrefetchSquashed() with setBlockCached(). The case of where the prefetch target is found in the writeback MSHRs of upper level caches continues to be covered by the MEM_INHIBIT flag.
2015-03-23mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMWSteve Reinhardt
Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.
2015-03-23mem: Tidy up RequestAndreas Hansson
This patch does a bit of house keeping, fixing up typos, removing dead code etc.
2015-03-19mem: Use emplace front/back for deferred packetsAndreas Hansson
Embrace C++11 for the deferred packets as we actually store the objects in the data structure, and not just pointers.
2015-03-19mem: Enable CommMonitor to output traces in atomic modeGeoffrey Blake
The CommMonitor by default only allows memory traces to be gathered in timing mode. This patch allows memory traces to be gathered in atomic mode if all one needs is a functional trace of memory addresses used and timing information is of a secondary concern.
2015-02-11mem: remove redundant test in in Cache::recvTimingResp()Steve Reinhardt
For some reason we were checking mshr->hasTargets() even though we had already called mshr->getTarget() unconditionally earlier in the same function (which asserts if there are no targets). Get rid of this useless check, and while we're at it get rid of the redundant call to mshr->getTarget(), since we still have the value saved in a local var.
2015-02-11mem: add local var in Cache::recvTimingResp()Steve Reinhardt
The main loop in recvTimingResp() uses target->pkt all over the place. Create a local tgt_pkt to help keep lines under the line length limit.
2015-02-11mem: restructure Packet cmd initialization a bit moreSteve Reinhardt
Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects. Also replaced the code in the Minor model that was still doing it the ad-hoc way. This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
2015-03-14mem: clean up write buffer check in Cache::handleSnoop()Steve Reinhardt
The 'if (writebacks.size)' check was redundant, because writeBuffer.findMatches() would return false if the writebacks list was empty. Also renamed 'mshr' to 'wb_entry' in this context since we are pointing at a writebuffer entry and not an MSHR (even though it's the same C++ class).
2015-03-02mem: Unify all cache DPRINTF address formattingAndreas Hansson
This patch changes all the DPRINTF messages in the cache to use '%#llx' every time a packet address is printed. The inclusion of '#' ensures '0x' is prepended, and since the address type is a uint64_t %x really should be %llx.
2015-03-02mem: Fix cache MSHR conflict determinationAndreas Hansson
This patch fixes a rather subtle issue in the sending of MSHR requests in the cache, where the logic previously did not check for conflicts between the MSRH queue and the write queue when requests were not ready. The correct thing to do is to always check, since not having a ready MSHR does not guarantee that there is no conflict. The underlying problem seems to have slipped past due to the symmetric timings used for the write queue and MSHR queue. However, with the recent timing changes the bug caused regressions to fail.
2015-03-02mem: Add byte mask to Packet::checkFunctionalAndreas Hansson
This patch changes the valid-bytes start/end to a proper byte mask. With the changes in timing introduced in previous patches there are more packets waiting in queues, and there are regressions using the checker CPU failing due to non-contigous read data being found in the various cache queues. This patch also adds some more comments explaining what is going on, and adds the fourth and missing case to Packet::checkFunctional.
2015-03-02mem: Add option to force in-order insertion in PacketQueueStephan Diestelhorst
By default, the packet queue is ordered by the ticks of the to-be-sent packages. With the recent modifications of packages sinking their header time when their resposne leaves the caches, there could be cases of MSHR targets being allocated and ordered A, B, but their responses being sent out in the order B,A. This led to inconsistencies in bus traffic, in particular the snoop filter observing first a ReadExResp and later a ReadRespWithInv. Logically, these were ordered the other way around behind the MSHR, but due to the timing adjustments when inserting into the PacketQueue, they were sent out in the wrong order on the bus, confusing the snoop filter. This patch adds a flag (off by default) such that these special cases can request in-order insertion into the packet queue, which might offset timing slighty. This is expected to occur rarely and not affect timing results.
2015-03-02mem: Downstream components consumes new crossbar delaysMarco Balboni
This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.
2015-03-02mem: Move crossbar default latencies to subclassesAndreas Hansson
This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.
2015-03-02mem: Add crossbar latenciesMarco Balboni
This patch introduces latencies in crossbar that were neglected before. In particular, it adds three parameters in crossbar model: front_end_latency, forward_latency, and response_latency. Along with these parameters, three corresponding members are added: frontEndLatency, forwardLatency, and responseLatency. The coherent crossbar has an additional snoop_response_latency. The latency of the request path through the xbar is set as --> frontEndLatency + forwardLatency In case the snoop filter is enabled, the request path latency is charged also by look-up latency of the snoop filter. --> frontEndLatency + SF(lookupLatency) + forwardLatency. The latency of the response path through the xbar is set instead as --> responseLatency. In case of snoop response, if the response is treated as a normal response the latency associated is again --> responseLatency; If instead it is forwarded as snoop response we add an additional variable + snoopResponseLatency and the latency associated is --> snoopResponseLatency; Furthermore, this patch lets the crossbar progress on the next clock edge after an unused retry, changing the time the crossbar considers itself busy after sending a retry that was not acted upon.
2015-03-02mem: Tidy up the cache debug messagesAndreas Hansson
Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02mem: Split port retry for all different packet classesAndreas Hansson
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-03-02mem: Fix prefetchSquash + memInhibitAsserted bugAli Jafri
This patch resolves a bug with hardware prefetches. Before a hardware prefetch is sent towards the memory, the system generates a snoop request to check all caches above the prefetch generating cache for the presence of the prefetth target. If the prefetch target is found in the tags or the MSHRs of the upper caches, the cache sets the prefetchSquashed flag in the snoop packet. When the snoop packet returns with the prefetchSquashed flag set, the prefetch generating cache deallocates the MSHR reserved for the prefetch. If the prefetch target is found in the writeback buffer of the upper cache, the cache sets the memInhibit flag, which signals the prefetch generating cache to expect the data from the writeback. When the snoop packet returns with the memInhibitAsserted flag set, it marks the allocated MSHR as inService and waits for the data from the writeback. If the prefetch target is found in multiple upper level caches, specifically in the tags or MSHRs of one upper level cache and the writeback buffer of another, the snoop packet will return with both prefetchSquashed and memInhibitAsserted set, while the current code is not written to handle such an outcome. Current code checks for the prefetchSquashed flag first, if it finds the flag, it deallocates the reserved MSHR. This leads to assert failure when the data from the writeback appears at cache. In this fix, we simply switch the order of checks. We first check for memInhibitAsserted and then for prefetch squashed.
2015-02-26Ruby: Update backing store option to propagate through to all RubyPortsJason Power
Previously, the user would have to manually set access_backing_store=True on all RubyPorts (Sequencers) in the config files. Now, instead there is one global option that each RubyPort checks on initialization. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-02-16mem: Fix initial value problem with MemCheckerStephan Diestelhorst
In highly loaded cases, reads might actually overlap with writes to the initial memory state. The mem checker needs to detect such cases and permit the read reading either from the writes (what it is doing now) or read from the initial, unknown value. This patch adds this logic.
2015-02-16mem: mmap the backing store with MAP_NORESERVEAndreas Hansson
This patch ensures we can run simulations with very large simulated memories (at least 64 TB based on some quick runs on a Linux workstation). In essence this allows us to efficiently deal with sparse address maps without having to implement a redirection layer in the backing store. This opens up for run-time errors if we eventually exhausts the hosts memory and swap space, but this should hopefully never happen.
2015-02-16mem: Use the range cache for lookup as well as accessAndreas Hansson
This patch changes the range cache used in the global physical memory to be an iterator so that we can use it not only as part of isMemAddr, but also access and functionalAccess. This matches use-cases where a core is using the atomic non-caching memory mode, and repeatedly calls isMemAddr and access. Linux boot on aarch32, with a single atomic CPU, is now more than 30% faster when using "--fastmem" compared to not using the direct memory access.