gem5 - gem5

Age	Commit message (Collapse)	Author
2016-02-06	style: fix missing spaces in control statements	Steve Reinhardt
	Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06	style: remove trailing whitespace	Steve Reinhardt
	Result of running 'hg m5style --skip-all --fix-white -a'.
2016-01-22	ruby: removed Write_Only AccessPermission	Brad Beckmann

2015-07-20	ruby: split CPU and GPU latency stats	David Hashe

2016-01-19	gpu-compute: AMD's baseline GPU model	Tony Gutierrez

2016-01-19	mem: write combining for ruby protocols	Tony Gutierrez
	This patch adds support for write-combining in ruby.
2016-01-19	* * *	Tony Gutierrez
	mem: support for gpu-style RMWs in ruby This patch adds support for GPU-style read-modify-write (RMW) operations in ruby. Such atomic operations are traditionally executed at the memory controller (instead of through an L1 cache using cache-line locking). Currently, this patch works by propogating operation functors through the memory system.
2015-07-20	mem: misc flags for AMD gpu model	Blake Hechtman
	This patch add support to mark memory requests/packets with attributes defined in HSA, such as memory order and scope.
2016-01-11	mem: fix bug in packet access endianness changes	Steve Reinhardt
	The new Packet::setRaw() method incorrectly still contained an htog() conversion. As a result, calls to the old set() method (now defined as setRaw(htog(v))) underwent two htog conversions, which breaks things when htog() is not a no-op. Interestingly the only test that caught this was a SPARC boot test, where an IsaFake device with a non-zero return value was getting swapped twice resulting in a register getting loaded with 0x100000000000000 instead of 1. (Good reason for keeping SPARC around, perhaps?)
2016-01-11	scons: Enable -Wextra by default	Andreas Hansson
	Make best use of the compiler, and enable -Wextra as well as -Wall. There are a few issues that had to be resolved, but they are all trivial.
2015-12-31	mem: add CacheVerbose debug flag, filter noisy DPRINTFs	Steve Reinhardt
	Some of the DPRINTFs added to the classic cache in cset 45df88079f04, while useful to those unfamiliar with the cache code, end up being noise when you're familiar with the code but are trying to debug tricky protocol issues. (Particularly getting two messages from each cache as it receives a snoop request then declares that there was no match.) This patch introduces a CacheVerbose debug flag, and moves a subset of the added DPRINTFs into that category, so that Cache by itself returns to being a more succinct summary of cache activity. Also added a CacheAll compound flag to turn on all the cache-related debug flags (other than CacheTags, which you really have to want badly to turn it on, IMO).
2015-12-31	mem: Do not rely on the NeedsWritable flag for responses	Andreas Hansson
	This patch removes the NeedsWritable flag for all responses, as it is really only the request that needs a writable response. The response, on the other hand, should in these cases always provide the line in a writable state, as indicated by the hasSharers flag not being set. When we send requests that has NeedsWritable set, the response will always have the hasSharers flag not set. Additionally, there are cases where the request did not have NeedsWritable set, and we still get a writable response with the hasSharers flag not set. This never happens on snoops, but is used by downstream caches to pass ownership upstream. As part of this patch, the affected response types are updated, and the snoop filter is similarly modified to check only the hasSharers flag (as it should). A sanity check is also added to the packet class, asserting that we never look at the NeedsWritable flag for responses. No regressions are affected.
2015-12-31	mem: Do not allocate space for packet data if not needed	Andreas Hansson
	This patch looks at the request and response command to determine if either actually has any data payload, and if not, we do not allocate any space for packet data. The only tricky case is where the command type is changed as part of the MSHR functionality. In these cases where the original packet had no data, but the new packet does, we need to explicitly call allocate().
2015-12-31	mem: Do not alter cache block state on uncacheable snoops	Andreas Hansson
	This patch ensures we do not respond with a Modified (dirty and writable) line if the request is uncacheable, and that the cache responding retains the line without modifying the state (even if responding).
2015-12-31	mem: Make cache terminology easier to understand	Andreas Hansson
	This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand.
2015-07-20	ruby: slicc: have a static MachineType	Tony Gutierrez
	This patch is imported from reviewboard patch 2551 by Nilay. This patch moves from a dynamically defined MachineType to a statically defined one. The need for this patch was felt since a dynamically defined type prevents us from having types for which no machine definition may exist. The following changes have been made: i. each machine definition now uses a type from the MachineType enumeration instead of any random identifier. This required changing the grammar and the .sm files. ii. MachineType enumeration defined statically in RubySlicc_Exports.sm. * * normal protocol fixes for nilay's parser machine type fix
2015-07-20	ruby: slicc: remove support for single machine, multiple types	Tony Gutierrez
	This patch is imported from reviewboard patch 2550 by Nilay. It was possible to specify multiple machine types with a single state machine. This seems unnecessary and is being removed.
2015-12-28	mem: Explicitly check MSHR snoops for cases not dealt with	Andreas Hansson
	Add a sanity check to make it explicit that we currently do not allow an I/O coherent agent to directly issue writes into the coherent part of the memory system (it has to go via a cache, and get transformed into a read ex, upgrade or invalidation).
2015-12-28	mem: Remove unused cache squash functionality	Andreas Hansson
	This patch removes the unused squash function from the MSHR queue, and the associated (and also unused) threadNum member from the MSHR.
2015-12-28	mem: Avoid unecessary checks when creating HardPFReq in cache	Andreas Hansson
	The checks made before sending out a HardPFReq were unecessarily complex, and checked for cases that never occur. This patch tidies it up.
2015-12-28	mem: Do not use sender state to track forwarded snoops in cache	Andreas Hansson
	This patch changes how the cache tracks which snoops are forwarded, and which ones are created locally. Previously the identification was based on an empty sender state of a specific class, but this method fails to distinguish which cache actually attached the sender state. Instead we use the same mechanism as the crossbar, and keep track of the requests that have outstanding snoops.
2015-12-28	mem: Fix cache sender state handling and add clarification	Andreas Hansson
	This patch addresses a bug in how the cache attached the MSHR as a sender state. Rather than overwriting any existing sender state it now pushes a new one. The handling of upward snoops is also clarified.
2015-12-17	mem: Fix memory allocation bug in deferred snoop handling	Andreas Hansson
	This patch fixes a corner case in the deferred snoop handling, where requests ended up being used by multiple packets with different lifetimes, and inadvertently got deleted while they were still in use.
2015-07-20	mem: add request types for acquire and release	David Hashe
	Add support for acquire and release requests. These synchronization operations are commonly supported by several modern instruction sets.
2015-07-20	ruby: more flexible ruby tester support	Brad Beckmann
	This patch allows the ruby random tester to use ruby ports that may only support instr or data requests. This patch is similar to a previous changeset (8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets. This current patch implements the support in a more straight-forward way. Since retries are now tested when running the ruby random tester, this patch splits up the retry and drain check behavior so that RubyPort children, such as the GPUCoalescer, can perform those operations correctly without having to duplicate code. Finally, the patch also includes better DPRINTFs for debugging the tester.
2015-12-09	mem: remove acq/rel cmds from packet and add mem fence req	Tony Gutierrez

2015-12-07	cpu: Support virtual addr in elastic traces	Radhika Jagtap
	This patch adds support to optionally capture the virtual address and asid for load/store instructions in the elastic traces. If they are present in the traces, Trace CPU will set those fields of the request during replay.
2015-12-07	mem: Add instruction sequence number to request	Radhika Jagtap
	This patch adds the instruction sequence number to the request and provides a request constructor that accepts a sequence number for initialization.
2015-11-25	mem: Fix search-replace issues in DRAMPower wrapper license	Andreas Hansson
	Fix a number of unintentional insertions of 'const'.
2015-11-15	arm: Add missing explicit overrides for classic caches	Andreas Sandberg
	Make clang when compiling on OSX.
2015-07-20	ruby: added stl vector of ints to be used by SLICC	Brad Beckmann

2015-11-13	slicc: fixes for the Address to Addr changeset (11025)	Tony Gutierrez
	misc changes now that Address has become Addr including int to address util function
2015-11-13	ruby: add BoolVec	Joe Gross
	The BoolVec typedef and insertion operator overload function simplify usage of vectors of type bool
2015-07-20	mem: add boolean to disable PacketQueue's size sanity check	Brad Beckmann
	the sanity check, while generally useful for exposing memory system bugs, may be spurious with respect to GPU workloads, which may generate many more requests than typical CPU workloads. the large number of requests generated by the GPU may cause the req/resp queues to back up, thus queueing more than 100 packets.
2015-11-06	mem: Add an option to perform clean writebacks from caches	Andreas Hansson
	This patch adds the necessary commands and cache functionality to allow clean writebacks. This functionality is crucial, especially when having exclusive (victim) caches. For example, if read-only L1 instruction caches are not sending clean writebacks, there will never be any spills from the L1 to the L2. At the moment the cache model defaults to not sending clean writebacks, and this should possibly be re-evaluated. The implementation of clean writebacks relies on a new packet command WritebackClean, which acts much like a Writeback (renamed WritebackDirty), and also much like a CleanEvict. On eviction of a clean block the cache either sends a clean evict, or a clean writeback, and if any copies are still cached upstream the clean evict/writeback is dropped. Similarly, if a clean evict/writeback reaches a cache where there are outstanding MSHRs for the block, the packet is dropped. In the typical case though, the clean writeback allocates a block in the downstream cache, and marks it writable if the evicted block was writable. The patch changes the O3_ARM_v7a L1 cache configuration and the default L1 caches in config/common/Caches.py
2015-11-06	mem: Add cache clusivity	Andreas Hansson
	This patch adds a parameter to control the cache clusivity, that is if the cache is mostly inclusive or exclusive. At the moment there is no intention to support strict policies, and thus the options are: 1) mostly inclusive, or 2) mostly exclusive. The choice of policy guides the behaviuor on a cache fill, and a new helper function, allocOnFill, is created to encapsulate the decision making process. For the timing mode, the decision is annotated on the MSHR on sending out the downstream packet, and in atomic we directly pass the decision to handleFill. We (ab)use the tempBlock in cases where we are not allocating on fill, leaving the rest of the cache unaffected. Simple and effective. This patch also makes it more explicit that multiple caches are allowed to consider a block writable (this is the case also before this patch). That is, for a mostly inclusive cache, multiple caches upstream may also consider the block exclusive. The caches considering the block writable/exclusive all appear along the same path to memory, and from a coherency protocol point of view it works due to the fact that we always snoop upwards in zero time before querying any downstream cache. Note that this patch does not introduce clean writebacks. Thus, for clean lines we are essentially removing a cache level if it is made mostly exclusive. For example, lines from the read-only L1 instruction cache or table-walker cache are always clean, and simply get dropped rather than being passed to the L2. If the L2 is mostly exclusive and does not allocate on fill it will thus never hold the line. A follow on patch adds the clean writebacks. The patch changes the L2 of the O3_ARM_v7a CPU configuration to be mostly exclusive (and stats are affected accordingly).
2015-11-06	mem: Avoid unnecessary snoops on writebacks and clean evictions	Ali Jafri
	This patch optimises the handling of writebacks and clean evictions when using a snoop filter. Instead of snooping into the caches to determine if the block is cached or not, simply set the status based on the snoop-filter result.
2015-11-06	mem: Order packet queue only on matching addresses	Andreas Hansson
	Instead of conservatively enforcing order for all packets, which may negatively impact the simulated-system performance, this patch updates the packet queue such that it only applies the restriction if there are already packets with the same address in the queue. The basic need for the order enforcement is due to coherency interactions where requests/responses to the same cache line must not over-take each other. We rely on the fact that any packet that needs order enforcement will have a block-aligned address. Thus, there is no need for the queue to know about the cacheline size.
2015-11-06	mem: Enforce insertion order on the cache response path	Ali Jafri
	This patch enforces insertion order transmission of packets on the response path in the cache. Note that the logic to enforce order is already present in the packet queue, this patch simply turns it on for queues in the response path. Without this patch, there are corner cases where a request-response is faster than a response-response forwarded through the cache. This violation of queuing order causes problems in the snoop filter leaving it with inaccurate information. This causes assert failures in the snoop filter later on. A follow on patch relaxes the order enforcement in the packet queue to limit the performance impact.
2015-11-06	mem: Use the packet delays and do not just zero them out	Andreas Hansson
	This patch updates the I/O devices, bridge and simple memory to take the packet header and payload delay into account in their latency calculations. In all cases we add the header delay, i.e. the accumulated pipeline delay of any crossbars, and the payload delay needed for deserialisation of any payload. Due to the additional unknown latency contribution, the packet queue of the simple memory is changed to use insertion sorting based on the time stamp. Moreover, since the memory hands out exclusive (non shared) responses, we also need to ensure ordering for reads to the same address.
2015-11-06	mem: Align rules for sinking inhibited packets at the slave	Andreas Hansson
	This patch aligns how the memory-system slaves, i.e. the various memory controllers and the bridge, identify and deal with sinking of inhibited packets that are only useful within the coherent part of the memory system. In the future we could shift the onus to the crossbar, and add a parameter "is_point_of_coherence" that would allow it to sink the aforementioned packets.
2015-11-06	mem: Do not treat CleanEvict as a write operation	Andreas Hansson
	This patch changes the CleanEvict command type to not be considered a write. Initially it was made a zero-sized write to match the writeback command, but as things developed it became clear that it causes more problems than it solves. For example, the memory modules (and bridge) should not consider the CleanEvict as a write, but instead discard it. With this patch it will be neither a read, nor write, and as it does not need a response the slave will simply sink it.
2015-11-06	mem: Unify delayed packet deletion	Andreas Hansson
	This patch unifies how we deal with delayed packet deletion, where the receiving slave is responsible for deleting the packet, but the sending agent (e.g. a cache) is still relying on the pointer until the call to sendTimingReq completes. Previously we used a mix of a deletion vector and a construct using unique_ptr. With this patch we ensure all slaves use the latter approach.
2015-11-06	misc: Appease clang static analyzer	Andreas Hansson
	A few minor fixes to issues identified by the clang static analyzer.
2015-11-06	mem: Check the XBar's port queues on functional snoops	Andreas Sandberg
	The CoherentXBar currently doesn't check its queued slave ports when receiving a functional snoop. This caused data corruption in cases when a modified cache lines is forwarded between two caches. Add the required functional calls into the queued slave ports.
2015-11-03	mem: hmc: minor fixes	Erfan Azarkhish
	This patch performs two minor fixes to DRAMCtrl.py and xbar.hh in favor of the HMC patch series. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03	mem: hmc: serial link model	Erfan Azarkhish
	This changeset adds a serial link model for the Hybrid Memory Cube (HMC). SerialLink is a simple variation of the Bridge class, with the ability to account for the latency of packet serialization. Also trySendTiming has been modified to correctly model bandwidth. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03	mem: hmc: adds controller	Erfan Azarkhish
	This patch models a simple HMC Controller. It simply schedules the incoming packets to HMC Serial Links using a round robin mechanism. This patch should be applied in series with other patches modeling a complete HMC device. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-10-29	mem: Clarify cache MSHR handling on fill	Andreas Hansson
	This patch addresses the upgrading of deferred targets in the MSHR, and makes it clearer by explicitly calling out what is happening (deferred targets are promoted if we get exclusivity without asking for it).
2015-10-23	x86: Add missing explicit overrides for X86 devices	Andreas Hansson
	Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.