gem5 - gem5

Age	Commit message (Collapse)	Author
2015-03-17	mem: Create a request copy for deferred snoops	Stephan Diestelhorst
	Sometimes, we need to defer an express snoop in an MSHR, but the original request might complete and deallocate the original pkt->req. In those cases, create a copy of the request so that someone who is inspecting the delayed snoop can also inspect the request still. All of this is rather hacky, but the allocation / linking and general life-time management of Packet and Request is rather tricky. Deleting the copy is another tricky area, testing so far has shown that the right copy is deleted at the right time.
2015-05-05	mem, cpu: Add a separate flag for strictly ordered memory	Andreas Sandberg
	The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
2015-05-05	mem, alpha: Move Alpha-specific request flags	Andreas Sandberg
	Move Alpha-specific memory request flags to an architecture-specific header and map them to the architecture specific flag bit range.
2015-05-05	mem: Snoop into caches on uncacheable accesses	Andreas Hansson
	This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent. The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block.
2015-05-05	mem: Pass shared downstream through caches	Andreas Hansson
	This patch ensures that we pass on information about a packet being shared (rather than exclusive), when forwarding a packet downstream. Without this patch there is a risk that a downstream cache considers the line exclusive when it really isn't.
2015-05-05	mem: Add forward snoop check for HardPFReqs	Ali Jafri
	We should always check whether the cache is supposed to be forwarding snoops before generating snoops.
2015-05-05	mem: Add missing stats update for uncacheable MSHRs	Andreas Hansson
	This patch adds a missing counter update for the uncacheable accesses. By updating this counter we also get a meaningful average latency for uncacheable accesses (previously inf).
2015-05-05	mem: Tidy up BaseCache parameters	Andreas Hansson
	This patch simply tidies up the BaseCache parameters and removes the unused "two_queue" parameter.
2015-05-05	mem: Remove templates in cache model	David Guillen
	This patch changes the cache implementation to rely on virtual methods rather than using the replacement policy as a template argument. There is no impact on the simulation performance, and overall the changes make it easier to modify (and subclass) the cache and/or replacement policy.
2015-04-29	mem: Simplify page close checks for adaptive policies	Rizwana Begum
	Both open_adaptive and close_adaptive page polices keep the page open if a row hit is found. If a row hit is not found, close_adaptive page policy precharges the row, and open_adaptive policy precharges the row only if there is a bank conflict request waiting in the queue. This patch makes the checks for above conditions simpler. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-29	ruby: set: replace long by unsigned long	Nilay Vaish
	UBSan complains about negative value being shifted
2015-04-13	ruby: allow restoring from checkpoint when using DRAMCtrl	Lena Olson
	Restoring from a checkpoint with ruby + the DRAMCtrl memory model was not working, because ruby and DRAMCtrl disagreed on the current tick during warmup. Since there is no reason to do timing requests during warmup, use functional requests instead. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-03-27	mem: Support any number of master-IDs in stride prefetcher	Stephan Diestelhorst
	The stride prefetcher had a hardcoded number of contexts (i.e. master-IDs) that it could handle. Since master IDs need to be unique per system, and every core, cache etc. requires a separate master port, a static limit on these does not make much sense. Instead, this patch adds a small hash map that will map all master IDs to the right prefetch state and dynamically allocates new state for new master IDs.
2015-03-27	mem: Allocate cache writebacks before new MSHRs	Andreas Hansson
	This patch changes the order of writeback allocation such that any writebacks resulting from a tag lookup (e.g. for an uncacheable access), are added to the writebuffer before any new MSHR entries are allocated. This ensures that the writebacks logically precedes the new allocations. The patch also changes the uncacheable flush to use proper timed (or atomic) writebacks, as opposed to functional writes.
2015-03-27	mem: Cleanup flow for uncacheable accesses	Andreas Hansson
	This patch simplifies the code dealing with uncacheable timing accesses, aiming to align it with the existing miss handling. Similar to what we do in atomic, a timing request now goes through Cache::access (where the block is also flushed), and then proceeds to ignore any existing MSHR for the block in question. This unifies the flow for cacheable and uncacheable accesses, and for atomic and timing.
2015-03-27	mem: Ignore uncacheable MSHRs when finding matches	Andreas Hansson
	This patch changes how we search for matching MSHRs, ignoring any MSHR that is allocated for an uncacheable access. By doing so, this patch fixes a corner case in the MSHRs where incorrect data ended up being copied into a (cacheable) read packet due to a first uncacheable MSHR target of size 4, followed by a cacheable target to the same MSHR of size 64. The latter target was filled with nonsense data.
2015-03-27	mem: Remove redundant allocateUncachedReadBuffer in cache	Andreas Hansson
	This patch removes the no-longer-needed allocateUncachedReadBuffer. Besides the checks it is exactly the same as allocateMissBuffer and thus provides no value.
2015-03-27	mem: Modernise MSHR iterators to C++11	Andreas Hansson
	This patch updates the iterators in the MSHR and MSHR queues to use C++11 range-based for loops. It also does a bit of additional house keeping.
2015-03-27	mem: Align all MSHR entries to block boundaries	Andreas Hansson
	This patch aligns all MSHR queue entries to block boundaries to simplify checks for matches. Previously there were corner cases that could lead to existing entries not being identified as matches. There are, rather alarmingly, a few regressions that change with this patch.
2015-03-27	mem: Rename PREFETCH_SNOOP_SQUASH flag to BLOCK_CACHED	Ali Jafri
	This patch subsumes the PREFETCH_SNOOP_SQUASH flag with the more generic BLOCK_CACHED flag. Future patches implementing cache eviction messages can use the BLOCK_CACHED flag in almost the same manner as hardware prefetches use the PREFETCH_SNOOP_SQUASH flag. The PREFTECH_SNOOP_FLAG is set if the prefetch target is found in the tags or the MSHRs in any state, so we are simply replacing calls to setPrefetchSquashed() with setBlockCached(). The case of where the prefetch target is found in the writeback MSHRs of upper level caches continues to be covered by the MEM_INHIBIT flag.
2015-03-23	mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW	Steve Reinhardt
	Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.
2015-03-23	mem: Tidy up Request	Andreas Hansson
	This patch does a bit of house keeping, fixing up typos, removing dead code etc.
2015-03-19	mem: Use emplace front/back for deferred packets	Andreas Hansson
	Embrace C++11 for the deferred packets as we actually store the objects in the data structure, and not just pointers.
2015-03-19	mem: Enable CommMonitor to output traces in atomic mode	Geoffrey Blake
	The CommMonitor by default only allows memory traces to be gathered in timing mode. This patch allows memory traces to be gathered in atomic mode if all one needs is a functional trace of memory addresses used and timing information is of a secondary concern.
2015-02-11	mem: remove redundant test in in Cache::recvTimingResp()	Steve Reinhardt
	For some reason we were checking mshr->hasTargets() even though we had already called mshr->getTarget() unconditionally earlier in the same function (which asserts if there are no targets). Get rid of this useless check, and while we're at it get rid of the redundant call to mshr->getTarget(), since we still have the value saved in a local var.
2015-02-11	mem: add local var in Cache::recvTimingResp()	Steve Reinhardt
	The main loop in recvTimingResp() uses target->pkt all over the place. Create a local tgt_pkt to help keep lines under the line length limit.
2015-02-11	mem: restructure Packet cmd initialization a bit more	Steve Reinhardt
	Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects. Also replaced the code in the Minor model that was still doing it the ad-hoc way. This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
2015-03-14	mem: clean up write buffer check in Cache::handleSnoop()	Steve Reinhardt
	The 'if (writebacks.size)' check was redundant, because writeBuffer.findMatches() would return false if the writebacks list was empty. Also renamed 'mshr' to 'wb_entry' in this context since we are pointing at a writebuffer entry and not an MSHR (even though it's the same C++ class).
2015-03-02	mem: Unify all cache DPRINTF address formatting	Andreas Hansson
	This patch changes all the DPRINTF messages in the cache to use '%#llx' every time a packet address is printed. The inclusion of '#' ensures '0x' is prepended, and since the address type is a uint64_t %x really should be %llx.
2015-03-02	mem: Fix cache MSHR conflict determination	Andreas Hansson
	This patch fixes a rather subtle issue in the sending of MSHR requests in the cache, where the logic previously did not check for conflicts between the MSRH queue and the write queue when requests were not ready. The correct thing to do is to always check, since not having a ready MSHR does not guarantee that there is no conflict. The underlying problem seems to have slipped past due to the symmetric timings used for the write queue and MSHR queue. However, with the recent timing changes the bug caused regressions to fail.
2015-03-02	mem: Add byte mask to Packet::checkFunctional	Andreas Hansson
	This patch changes the valid-bytes start/end to a proper byte mask. With the changes in timing introduced in previous patches there are more packets waiting in queues, and there are regressions using the checker CPU failing due to non-contigous read data being found in the various cache queues. This patch also adds some more comments explaining what is going on, and adds the fourth and missing case to Packet::checkFunctional.
2015-03-02	mem: Add option to force in-order insertion in PacketQueue	Stephan Diestelhorst
	By default, the packet queue is ordered by the ticks of the to-be-sent packages. With the recent modifications of packages sinking their header time when their resposne leaves the caches, there could be cases of MSHR targets being allocated and ordered A, B, but their responses being sent out in the order B,A. This led to inconsistencies in bus traffic, in particular the snoop filter observing first a ReadExResp and later a ReadRespWithInv. Logically, these were ordered the other way around behind the MSHR, but due to the timing adjustments when inserting into the PacketQueue, they were sent out in the wrong order on the bus, confusing the snoop filter. This patch adds a flag (off by default) such that these special cases can request in-order insertion into the packet queue, which might offset timing slighty. This is expected to occur rarely and not affect timing results.
2015-03-02	mem: Downstream components consumes new crossbar delays	Marco Balboni
	This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.
2015-03-02	mem: Move crossbar default latencies to subclasses	Andreas Hansson
	This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.
2015-03-02	mem: Add crossbar latencies	Marco Balboni
	This patch introduces latencies in crossbar that were neglected before. In particular, it adds three parameters in crossbar model: front_end_latency, forward_latency, and response_latency. Along with these parameters, three corresponding members are added: frontEndLatency, forwardLatency, and responseLatency. The coherent crossbar has an additional snoop_response_latency. The latency of the request path through the xbar is set as --> frontEndLatency + forwardLatency In case the snoop filter is enabled, the request path latency is charged also by look-up latency of the snoop filter. --> frontEndLatency + SF(lookupLatency) + forwardLatency. The latency of the response path through the xbar is set instead as --> responseLatency. In case of snoop response, if the response is treated as a normal response the latency associated is again --> responseLatency; If instead it is forwarded as snoop response we add an additional variable + snoopResponseLatency and the latency associated is --> snoopResponseLatency; Furthermore, this patch lets the crossbar progress on the next clock edge after an unused retry, changing the time the crossbar considers itself busy after sending a retry that was not acted upon.
2015-03-02	mem: Tidy up the cache debug messages	Andreas Hansson
	Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02	mem: Split port retry for all different packet classes	Andreas Hansson
	This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-03-02	mem: Fix prefetchSquash + memInhibitAsserted bug	Ali Jafri
	This patch resolves a bug with hardware prefetches. Before a hardware prefetch is sent towards the memory, the system generates a snoop request to check all caches above the prefetch generating cache for the presence of the prefetth target. If the prefetch target is found in the tags or the MSHRs of the upper caches, the cache sets the prefetchSquashed flag in the snoop packet. When the snoop packet returns with the prefetchSquashed flag set, the prefetch generating cache deallocates the MSHR reserved for the prefetch. If the prefetch target is found in the writeback buffer of the upper cache, the cache sets the memInhibit flag, which signals the prefetch generating cache to expect the data from the writeback. When the snoop packet returns with the memInhibitAsserted flag set, it marks the allocated MSHR as inService and waits for the data from the writeback. If the prefetch target is found in multiple upper level caches, specifically in the tags or MSHRs of one upper level cache and the writeback buffer of another, the snoop packet will return with both prefetchSquashed and memInhibitAsserted set, while the current code is not written to handle such an outcome. Current code checks for the prefetchSquashed flag first, if it finds the flag, it deallocates the reserved MSHR. This leads to assert failure when the data from the writeback appears at cache. In this fix, we simply switch the order of checks. We first check for memInhibitAsserted and then for prefetch squashed.
2015-02-26	Ruby: Update backing store option to propagate through to all RubyPorts	Jason Power
	Previously, the user would have to manually set access_backing_store=True on all RubyPorts (Sequencers) in the config files. Now, instead there is one global option that each RubyPort checks on initialization. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-02-16	mem: Fix initial value problem with MemChecker	Stephan Diestelhorst
	In highly loaded cases, reads might actually overlap with writes to the initial memory state. The mem checker needs to detect such cases and permit the read reading either from the writes (what it is doing now) or read from the initial, unknown value. This patch adds this logic.
2015-02-16	mem: mmap the backing store with MAP_NORESERVE	Andreas Hansson
	This patch ensures we can run simulations with very large simulated memories (at least 64 TB based on some quick runs on a Linux workstation). In essence this allows us to efficiently deal with sparse address maps without having to implement a redirection layer in the backing store. This opens up for run-time errors if we eventually exhausts the hosts memory and swap space, but this should hopefully never happen.
2015-02-16	mem: Use the range cache for lookup as well as access	Andreas Hansson
	This patch changes the range cache used in the global physical memory to be an iterator so that we can use it not only as part of isMemAddr, but also access and functionalAccess. This matches use-cases where a core is using the atomic non-caching memory mode, and repeatedly calls isMemAddr and access. Linux boot on aarch32, with a single atomic CPU, is now more than 30% faster when using "--fastmem" compared to not using the direct memory access.
2015-02-11	mem: Clarification of packet crossbar timings	Marco Balboni
	This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
2015-02-11	mem: Clarify usage of latency in the cache	Marco Balboni
	This patch adds some much-needed clarity in the specification of the cache timing. For now, hit_latency and response_latency are kept as top-level parameters, but the cache itself has a number of local variables to better map the individual timing variables to different behaviours (and sub-components). The introduced variables are: - lookupLatency: latency of tag lookup, occuring on any access - forwardLatency: latency that occurs in case of outbound miss - fillLatency: latency to fill a cache block We keep the existing responseLatency The forwardLatency is used by allocateInternalBuffer() for: - MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer); - MSHR allocateMissBuffer (cacheable miss in MSHR queue); - MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR queue) It is our assumption that the time for the above three buffers is the same. Similarly, for snoop responses passing through the cache we use forwardLatency.
2015-02-03	mem: Clarify express snoop behaviour	Andreas Hansson
	This patch adds a bit of documentation with insights around how express snoops really work.
2015-02-03	mem: Clarify cache behaviour for pending dirty responses	Andreas Hansson
	This patch adds a bit of clarification around the assumptions made in the cache when packets are sent out, and dirty responses are pending. As part of the change, the marking of an MSHR as in service is simplified slightly, and comments are added to explain what assumptions are made.
2015-02-03	config: Adjust DRAM channel interleaving defaults	Andreas Hansson
	This patch changes the DRAM channel interleaving default behaviour to be more representative. The default address mapping (RoRaBaCoCh) moves the channel bits towards the least significant bits, and uses 128 byte as the default channel interleaving granularity. These defaults can be overridden if desired, but should serve as a sensible starting point for most use-cases.
2015-01-22	mem: Remove unused Packet src and dest fields	Andreas Hansson
	This patch takes the final step in removing the src and dest fields in the packet. These fields were rather confusing in that they only remember a single multiplexing component, and pushed the responsibility to the bridge and caches to store the fields in a senderstate, thus effectively creating a stack. With the recent changes to the crossbar response routing the crossbar is now responsible without relying on the packet fields. Thus, these variables are now unused and can be removed.
2015-01-22	mem: Remove Packet source from ForwardResponseRecord	Andreas Hansson
	This patch removes the source field from the ForwardResponseRecord, but keeps the class as it is part of how the cache identifies responses to hardware prefetches that are snooped upwards.
2015-01-22	mem: Remove unused RequestState in the bridge	Andreas Hansson
	This patch removes the bridge sender state as the Crossbar now takes care of remembering its own routing decisions.