gem5 - gem5

Age	Commit message (Collapse)	Author
2015-05-05	mem, cpu: Add a separate flag for strictly ordered memory	Andreas Sandberg
	The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
2015-05-05	mem: Snoop into caches on uncacheable accesses	Andreas Hansson
	This patch takes a last step in fixing issues related to uncacheable accesses. We do not separate uncacheable memory from uncacheable devices, and in cases where it is really memory, there are valid scenarios where we need to snoop since we do not support cache maintenance instructions (yet). On snooping an uncacheable access we thus provide data if possible. In essence this makes uncacheable accesses IO coherent. The snoop filter is also queried to steer the snoops, but not updated since the uncacheable accesses do not allocate a block.
2015-05-05	cpu: Work around gcc 4.9 issues with Num_OpClasses	Andreas Hansson
	This patch fixes a recent issue with gcc 4.9 (and possibly more) being convinced that indices outside the array bounds are used when initialising the FUPool members.
2015-04-29	cpu: o3: replace issueLatency with bool pipelined	Nilay Vaish
	Currently, each op class has a parameter issueLat that denotes the cycles after which another op of the same class can be issued. As of now, this latency can either be one cycle (fully pipelined) or same as execution latency of the op (not at all pipelined). The fact that issueLat is a parameter of type Cycles makes one believe that it can be set to any value. To avoid the confusion, the parameter is being renamed as 'pipelined' with type boolean. If set to true, the op would execute in a fully pipelined fashion. Otherwise, it would execute in an unpipelined fashion.
2015-04-29	cpu: o3: single cycle default div microop latency on x86	Nilay Vaish
	This patch sets the default latency of the division microop to a single cycle on x86. This is because the division instructions DIV and IDIV have been implemented as loops of div microops, where each microop computes a single bit of the quotient.
2015-04-22	cpu: remove conditional check (count > 0) on o3 IQ squashes	Brandon Potter
	The o3 cpu instruction queue model uses the count variable to track the number of unissued instructions in the queue. Previously, the squash method used this variable to avoid executing the doSquash method when there were no unissued instructions in the pipeline. A corner case problem exists when only issued instructions exist in the pipeline and a squash occurs; the doSquash code is not invoked and subsequently does not clean up state properly.
2015-04-20	cpu: Remove the InOrderCPU from the tree	Andreas Hansson
	This patch takes the final step in removing the InOrderCPU from the tree. Rest in peace. The MinorCPU is now used to model an in-order microarchitecture, and long term the MinorCPU will eventually be renamed InOrderCPU.
2015-04-14	config, cpu: fix progress interval for switched CPUs	Malek Musleh
	This patch ensures that the CPU progress Event is triggered for the new set of switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids printing the interval state if the cpu is currently switched out. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-13	cpu: re-organizes the branch predictor structure.	Dibakar Gope
	Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-04-03	cpu: fix system total instructions accounting	Nikos Nikoleris
	The totalInstructions counter is only incremented when the whole instruction is commited and not on every microop. It was incorrectly reset in atomic and timing cpus. Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
2015-03-26	cpu: Fix InstPBTrace inheritance	Andreas Hansson
	This patch fixes an issue that prevented gem5 to be built with C++ config and without Python.
2015-03-23	mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW	Steve Reinhardt
	Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.
2015-03-19	cpu: Fix TrafficGen message format	Wendy Elsasser
	Fix erroneous message format for fatal error. Previously, code did not have type indicator (% instead of %d). Also removed redundant fatal check. Ran modified sweep.py with in range and out of range values to test.
2015-02-11	mem: restructure Packet cmd initialization a bit more	Steve Reinhardt
	Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects. Also replaced the code in the Minor model that was still doing it the ad-hoc way. This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
2015-03-09	cpu: o3: another assert instead of check	Nilay Vaish

2015-03-09	cpu: o3: Remove unused code in iew, add assert instead.	Nilay Vaish

2015-03-09	cpu: o3: commit: mark pipeline delay variable as consts	Nilay Vaish

2015-03-09	cpu: o3: remove unused stat variables.	Nilay Vaish

2015-03-09	cpu: o3: combine if with same condition	Nilay Vaish

2015-03-09	cpu: o3: remove member variable squashCounter	Nilay Vaish
	The variable is used in only one place and a whole new function setNextStatus() has been defined just to compute the value of the variable. Instead of calling the function, the value is now computed in the loop that preceded the function call.
2015-03-09	cpu: o3: remove unused function annotateMemoryUnits()	Nilay Vaish

2015-03-02	mem: Move crossbar default latencies to subclasses	Andreas Hansson
	This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.
2015-03-02	arm: Share a port for the two table walker objects	Andreas Hansson
	This patch changes how the MMU and table walkers are created such that a single port is used to connect the MMU and the TLBs to the memory system. Previously two ports were needed as there are two table walker objects (stage one and stage two), and they both had a port. Now the port itself is moved to the Stage2MMU, and each TableWalker is simply using the port from the parent. By using the same port we also remove the need for having an additional crossbar joining the two ports before the walker cache or the L2. This simplifies the creation of the CPU cache topology in BaseCPU.py considerably. Moreover, for naming and symmetry reasons, the TLB walker port is connected through the stage-one table walker thus making the naming identical to x86. Along the same line, we use the stage-one table walker to generate the master id that is used by all TLB-related requests.
2015-03-02	cpu: o3 register renaming request handling improved	Rekai
	Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not.
2015-03-02	mem: Split port retry for all different packet classes	Andreas Hansson
	This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-03-02	cpu: Add a PC-value to the traffic generator requests	Stephan Diestelhorst
	Have the traffic generator add its masterID as the PC address to the requests. That way, prefetchers (and other components) that use a PC for request classification will see per-tester streams of requests. This enables us to test strided prefetchers with the memchecker, too.
2015-02-16	cpu: TrafficGen sinks snoops without complaining	Andreas Hansson
	To be able to use the TrafficGen in a system with caches we need to allow it to sink incoming snoop requests. By default the master port panics, so silently ignore any snoops.
2015-02-16	arch: Make readMiscRegNoEffect const throughout	Andreas Hansson
	Finally took the plunge and made this apply to all ISAs, not just ARM.
2015-02-16	cpu: add support for outputing a protobuf formatted CPU trace	Ali Saidi
	Doesn't support x86 due to static instruction representation. --HG-- rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py
2015-02-11	cpu: Tidy up the MemTest and make false sharing more obvious	Andreas Hansson
	The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed. The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work. In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing.
2015-02-11	sim: Move the BaseTLB to src/arch/generic/	Andreas Sandberg
	The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-02-06	cpu: Idle CPU status logic revised	Alexandru Dutu
	This patch sets the CPU status to idle when the last active thread gets suspended.
2015-02-03	cpu: Ensure timing CPU sinks response before sending new request	Andreas Hansson
	This patch changes how the timing CPU deals with processing responses, always scheduling an event, even if it is for the current tick. This helps to avoid situations where a new request shows up before a response is finished in the crossbar, and also is more in line with any realistic behaviour.
2015-01-25	arm: always set the IsFirstMicroop flag	Ali Saidi
	While the IsFirstMicroop flag exists it was only occasionally used in the ARM instructions that gem5 microOps and therefore couldn't be relied on to be correct.
2015-01-25	sim: Clean up InstRecord	Ali Saidi
	Track memory size and flags as well as add some comments and consts.
2015-01-25	cpu: Remove all notion that we know when the cpu is misspeculating.	Ali Saidi
	We have no way of knowing if a CPU model is on the wrong path with our execute-in-execute CPU models. Don't pretend that we do.
2015-01-25	cpu: Put all CPU instruction tracers in a single file	Ali Saidi

2015-01-25	cpu: remove legion tracer	Ali Saidi
	If someone wants to debug with legion again they can restore the code from the repository, but no need to have it hang around indefinately.
2015-01-22	mem: Clean up Request initialisation	Andreas Hansson
	This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor.
2015-01-20	cpu: commit probe notification on every microop or macroop	Nikos Nikoleris
	The ppCommit should notify the attached listener every time the cpu commits a microop or non microcoded insturction. The listener can then decide whether it will process only the last microop (eg. SimPoint probe). Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-20	cpu: Fix retry bug in MinorCPU LSQ	Andreas Hansson

2015-01-10	cpu: fix RetiredStores probe point	Nikos Nikoleris
	Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-03	minor: fixed LSQ MasterPortID	Andrew Lukefahr
	Minor was reporting the data cache access as ".inst" accesses. This just switches the MasterPortID to dataMasterPortId. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-12-09	Let other objects set up memory like regions in a KVM VM.	Gabe Black

2014-12-05	cpu: Only check for PC events on instruction boundaries.	Gabe Black
	Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing.
2014-12-02	cpu: Fix retries on barrier/store in Minor's store buffer	Andrew Bardsley
	This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling. LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed. For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared. Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.
2014-12-02	cpu: Fix memoryIssueLimit checking in Minor	Andrew Bardsley
	This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU.
2014-12-02	cpu, o3: Ignored invalidate causing same-address load reordering	Marco Elver
	In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.
2014-12-02	cpu: Move packet deallocation to recvTimingResp in the O3 CPU	Stephan Diestelhorst
	Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.
2014-12-02	mem: Assume all dynamic packet data is array allocated	Andreas Hansson
	This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers. The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks. As the last part the patch, it renames dataDynamicArray to dataDynamic.