summaryrefslogtreecommitdiff
path: root/src/mem/coherent_xbar.cc
AgeCommit message (Collapse)Author
2015-12-31mem: Make cache terminology easier to understandAndreas Hansson
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand.
2015-11-06mem: Add an option to perform clean writebacks from cachesAndreas Hansson
This patch adds the necessary commands and cache functionality to allow clean writebacks. This functionality is crucial, especially when having exclusive (victim) caches. For example, if read-only L1 instruction caches are not sending clean writebacks, there will never be any spills from the L1 to the L2. At the moment the cache model defaults to not sending clean writebacks, and this should possibly be re-evaluated. The implementation of clean writebacks relies on a new packet command WritebackClean, which acts much like a Writeback (renamed WritebackDirty), and also much like a CleanEvict. On eviction of a clean block the cache either sends a clean evict, or a clean writeback, and if any copies are still cached upstream the clean evict/writeback is dropped. Similarly, if a clean evict/writeback reaches a cache where there are outstanding MSHRs for the block, the packet is dropped. In the typical case though, the clean writeback allocates a block in the downstream cache, and marks it writable if the evicted block was writable. The patch changes the O3_ARM_v7a L1 cache configuration and the default L1 caches in config/common/Caches.py
2015-11-06mem: Avoid unnecessary snoops on writebacks and clean evictionsAli Jafri
This patch optimises the handling of writebacks and clean evictions when using a snoop filter. Instead of snooping into the caches to determine if the block is cached or not, simply set the status based on the snoop-filter result.
2015-11-06mem: Unify delayed packet deletionAndreas Hansson
This patch unifies how we deal with delayed packet deletion, where the receiving slave is responsible for deleting the packet, but the sending agent (e.g. a cache) is still relying on the pointer until the call to sendTimingReq completes. Previously we used a mix of a deletion vector and a construct using unique_ptr. With this patch we ensure all slaves use the latter approach.
2015-11-06mem: Check the XBar's port queues on functional snoopsAndreas Sandberg
The CoherentXBar currently doesn't check its queued slave ports when receiving a functional snoop. This caused data corruption in cases when a modified cache lines is forwarded between two caches. Add the required functional calls into the queued slave ports.
2015-09-25mem: Only track snooping ports in the snoop filterAndreas Hansson
This patch changes the tracking of ports in the snoop filter to use local dense port IDs so that we can have 64 snooping ports (rather than crossbar slave ports). This is achieved by adding a simple remapping vector that translates the actal port IDs into the local slave IDs used in the SnoopMask. Ultimately this patch allows us to scale to much larger systems without introducing a hierarchy of crossbars.
2015-09-25mem: Store snoop filter lookup result to avoid second lookupAndreas Hansson
This patch introduces a private member storing the iterator from the lookupRequest call, such that it can be re-used when the request eventually finishes. The method previously called updateRequest is renamed finishRequest to make it more clear that the two functions must be called together.
2015-09-25mem: Add snoops for CleanEvicts and Writebacks in atomic modeAli Jafri
This patch mirrors the logic in timing mode which sends up snoops to check for cached copies before sending CleanEvicts and Writebacks down the memory hierarchy. In case there is a copy in a cache above, discard CleanEvicts and set the BLOCK_CACHED flag in Writebacks so that writebacks do not reset the cache residency bit in the snoop filter below.
2015-09-25mem: Add CleanEvict and Writeback support to snoop filtersAli Jafri
This patch adds the functionality to properly track CleanEvicts and Writebacks in the snoop filter. Previously there were no CleanEvicts, and Writebacks did not send up snoops to ensure there were no copies in caches above. Hence a writeback could never erase an entry from the snoop filter. When a CleanEvict message reaches a snoop filter, it confirms that the BLOCK_CACHED flag is not set and resets the bits corresponding to the CleanEvict address and port it arrived on. If none of the other peer caches have (or have requested) the block, the snoop filter forwards the CleanEvict to lower levels of memory. In case of a Writeback message, the snoop filter checks if the BLOCK_CACHED flag is not set and only then resets the bits corresponding to the Writeback address. If any of the other peer caches have (or has requested) the same block, the snoop filter sets the BLOCK_CACHED flag in the Writeback before forwarding it to lower levels of memory heirarachy.
2015-09-25mem: Make the coherent crossbar account for timing snoopsAndreas Hansson
This patch introduces the concept of a snoop latency. Given the requirement to snoop and forward packets in zero time (due to the coherency mechanism), the latency is accounted for later. On a snoop, we establish the latency, and later add it to the header delay of the packet. To allow multiple caches to contribute to the snoop latency, we use a separate variable in the packet, and then take the maximum before adding it to the header delay.
2015-09-25mem: Do not include snoop-filter latency in crossbar occupancyAndreas Hansson
This patch ensures that the snoop-filter latency only contributes to the packet latency, and not to the crossbar throughput/occupancy. In essence we treat the snoop-filter lookup as pipelined.
2015-07-07sim: Decouple draining from the SimObject hierarchyAndreas Sandberg
Draining is currently done by traversing the SimObject graph and calling drain()/drainResume() on the SimObjects. This is not ideal when non-SimObjects (e.g., ports) need draining since this means that SimObjects owning those objects need to be aware of this. This changeset moves the responsibility for finding objects that need draining from SimObjects and the Python-side of the simulator to the DrainManager. The DrainManager now maintains a set of all objects that need draining. To reduce the overhead in classes owning non-SimObjects that need draining, objects inheriting from Drainable now automatically register with the DrainManager. If such an object is destroyed, it is automatically unregistered. This means that drain() and drainResume() should never be called directly on a Drainable object. While implementing the new functionality, the DrainManager has now been made thread safe. In practice, this means that it takes a lock whenever it manipulates the set of Drainable objects since SimObjects in different threads may create Drainable objects dynamically. Similarly, the drain counter is now an atomic_uint, which ensures that it is manipulated correctly when objects signal that they are done draining. A nice side effect of these changes is that it makes the drain state changes stricter, which the simulation scripts can exploit to avoid redundant drains.
2015-07-03mem: Delay responses in the crossbar before forwardingAndreas Hansson
This patch changes how the crossbar classes deal with responses. Instead of forwarding responses directly and burdening the neighbouring modules in paying for the latency (through the pkt->headerDelay), we now queue them before sending them. The coherency protocol is not affected as requests and any snoop requests/responses are still passed on in zero time. Thus, the responses end up paying for any header delay accumulated when passing through the crossbar. Any latency incurred on the request path will be paid for on the response side, if no other module has dealt with it. As a result of this patch, responses are returned at a later point. This affects the number of outstanding transactions, and quite a few regressions see an impact in blocking due to no MSHRs, increased cache-miss latencies, etc. Going forward we should be able to use the same concept also for snoop responses, and any request that is not an express snoop.
2015-07-03mem: Add clean evicts to improve snoop filter trackingAli Jafri
This patch adds eviction notices to the caches, to provide accurate tracking of cache blocks in snoop filters. We add the CleanEvict message to the memory heirarchy and use both CleanEvicts and Writebacks with BLOCK_CACHED flags to propagate notice of clean and dirty evictions respectively, down the memory hierarchy. Note that the BLOCK_CACHED flag indicates whether there exist any copies of the evicted block in the caches above the evicting cache. The purpose of the CleanEvict message is to notify snoop filters of silent evictions in the relevant caches. The CleanEvict message behaves much like a Writeback. CleanEvict is a write and a request but unlike a Writeback, CleanEvict does not have data and does not need exclusive access to the block. The cache generates the CleanEvict message on a fill resulting in eviction of a clean block. Before travelling downwards CleanEvict requests generate zero-time snoop requests to check if the same block is cached in upper levels of the memory heirarchy. If the block exists, the cache discards the CleanEvict message. The snoops check the tags, writeback queue and the MSHRs of upper level caches in a manner similar to snoops generated from HardPFReqs. Currently CleanEvicts keep travelling towards main memory unless they encounter the block corresponding to their address or reach main memory (since we have no well defined point of serialisation). Main memory simply discards CleanEvict messages. We have modified the behavior of Writebacks, such that they generate snoops to check for the presence of blocks in upper level caches. It is possible in our current implmentation for a lower level cache to be writing back a block while a shared copy of the same block exists in the upper level cache. If the snoops find the same block in upper level caches, we set the BLOCK_CACHED flag in the Writeback message. We have also added logic to account for interaction of other message types with CleanEvicts waiting in the writeback queue. A simple example is of a response arriving at a cache removing any CleanEvicts to the same address from the cache's writeback queue.
2015-05-05mem: Snoop into caches on uncacheable accessesAndreas Hansson
This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent. The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block.
2015-03-02mem: Add crossbar latenciesMarco Balboni
This patch introduces latencies in crossbar that were neglected before. In particular, it adds three parameters in crossbar model: front_end_latency, forward_latency, and response_latency. Along with these parameters, three corresponding members are added: frontEndLatency, forwardLatency, and responseLatency. The coherent crossbar has an additional snoop_response_latency. The latency of the request path through the xbar is set as --> frontEndLatency + forwardLatency In case the snoop filter is enabled, the request path latency is charged also by look-up latency of the snoop filter. --> frontEndLatency + SF(lookupLatency) + forwardLatency. The latency of the response path through the xbar is set instead as --> responseLatency. In case of snoop response, if the response is treated as a normal response the latency associated is again --> responseLatency; If instead it is forwarded as snoop response we add an additional variable + snoopResponseLatency and the latency associated is --> snoopResponseLatency; Furthermore, this patch lets the crossbar progress on the next clock edge after an unused retry, changing the time the crossbar considers itself busy after sending a retry that was not acted upon.
2015-03-02mem: Split port retry for all different packet classesAndreas Hansson
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-02-11mem: Clarification of packet crossbar timingsMarco Balboni
This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
2015-01-22mem: Make the XBar responsible for tracking response routingAndreas Hansson
This patch removes the need for a source and destination field in the packet by shifting the onus of the tracking to the crossbar, much like a real implementation. This change in behaviour also means we no longer need a SenderState to remember the source/dest when ever we have multiple crossbars in the system. Thus, the stack that was created by the SenderState is not needed, and each crossbar locally tracks the response routing. The fields in the packet are still left behind as the RubyPort (which also acts as a crossbar) does routing based on them. In the succeeding patches the uses of the src and dest field will be removed. Combined, these patches improve the simulation performance by roughly 2%.
2014-12-02mem: Relax packet src/dest check and shift onus to crossbarAndreas Hansson
This patch allows objects to get the src/dest of a packet even if it is not set to a valid port id. This simplifies (ab)using the bridge as a buffer and latency adapter in situations where the neighbouring MemObjects are not crossbars. The checks that were done in the packet are now shifted to the crossbar where the fields are used to index into the port arrays. Thus, the carrier of the information is not burdened with checking, and the crossbar can check not only that the destination is set, but also that the port index is within limits.
2014-09-20mem: Rename Bus to XBar to better reflect its behaviourAndreas Hansson
This patch changes the name of the Bus classes to XBar to better reflect the actual timing behaviour. The actual instances in the config scripts are not renamed, and remain as e.g. iobus or membus. As part of this renaming, the code has also been clean up slightly, making use of range-based for loops and tidying up some comments. The only changes outside the bus/crossbar code is due to the delay variables in the packet. --HG-- rename : src/mem/Bus.py => src/mem/XBar.py rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh rename : src/mem/bus.cc => src/mem/xbar.cc rename : src/mem/bus.hh => src/mem/xbar.hh