gem5 - gem5

Age	Commit message (Collapse)	Author
2015-11-16	x86: Invalidating TLB entry on page fault	Swapnil Haria
	As per the x86 architecture specification, matching TLB entries need to be invalidated on a page fault. For instance, after a page fault due to inadequate protection bits on a TLB hit, the TLB entry needs to be invalidated. This behavior is clearly specified in the x86 architecture manuals from both AMD and Intel. This invalidation is missing currently in gem5, due to which linux kernel versions 3.8 and up cannot be simulated efficiently. This is exposed by a linux optimisation in commit e4a1cc56e4d728eb87072c71c07581524e5160b1, which removes a tlb flush on updating page table entries in x86. Testing: Linux kernel versions 3.8 onwards were booting very slowly in FS mode, due to repeated page faults (~300000 before the first print statement in a bash file). Ensured that page fault rate drops drastically and observed reduction in boot time from order of hours to minutes for linux kernel v3.8 and v3.11
2015-11-16	x86: cpuid: add family to warn() message	Bjoern A. Zeeb
	doCpuid() has to identical warn messages about unimplemented functions. Add the family to the log message to make them distinguishable. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-16	x86: pagetable walker: fix typo in comment	Bjoern A. Zeeb

2015-11-16	sparc: Make remote debugging with gdb work	Palle Lyckegaard
	Remove sparc V8 TBR register from list of registers since it is not part of sparc V9. This brings the number of registers in sync with what gdb expects Without this patch gdb complains about receoved packet too long. with this patch gdb is able to work properly with gem5 for remote debugging. Note: gdb is version 7.8 Note: gdb is configured with --target=sparc64-sun-solaris2.8 Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-16	o3: drop unused statistic wbPenalized and wbPenalizedRate	Nilay Vaish

2015-07-20	ruby: added stl vector of ints to be used by SLICC	Brad Beckmann

2015-11-13	slicc: fixes for the Address to Addr changeset (11025)	Tony Gutierrez
	misc changes now that Address has become Addr including int to address util function
2015-11-13	ruby: add BoolVec	Joe Gross
	The BoolVec typedef and insertion operator overload function simplify usage of vectors of type bool
2015-07-20	mem: add boolean to disable PacketQueue's size sanity check	Brad Beckmann
	the sanity check, while generally useful for exposing memory system bugs, may be spurious with respect to GPU workloads, which may generate many more requests than typical CPU workloads. the large number of requests generated by the GPU may cause the req/resp queues to back up, thus queueing more than 100 packets.
2015-11-11	dev, arm: Initialized the iccrpr register in the GIC	Andreas Sandberg
	The IICRPR register in the GIC is currently not being initialized when the GIC is instantiated. Initialize to the value mandated by the architecture specification.
2015-11-05	dev: Add basic checkpoint support to VirtIO9PProxy device	Sascha Bischoff
	This patch adds very basic checkpoint support for the VirtIO9PProxy device. Previously, attempts to checkpoint gem5 with a present 9P device caused gem5 to fatal as none of the state is tracked. We still do not track any state, but we replace the fatal with a warning which is triggered if the device has been used by the guest system. In the event that it has not been used, we assume that no state is lost during checkpointing. The warning is triggered on both a serialize and an unserialize to ensure maximum visibility for the user.
2015-11-09	dev: Remove unused header includes	Andreas Sandberg
	Devices should never need to include dev/pciconfall.hh. --HG-- extra : amend_source : 3a6e56485d432b49e2af22407982fa785c0ccb68
2015-11-09	dev: Don't access the platform directly in PCI devices	Andreas Sandberg
	Cleanup PCI devices to avoid using the PciDevice::platform pointer directly. The PCI-specific functionality provided by the Platform should be accessed through the wrappers in PciDevice.
2015-11-06	mem: Add an option to perform clean writebacks from caches	Andreas Hansson
	This patch adds the necessary commands and cache functionality to allow clean writebacks. This functionality is crucial, especially when having exclusive (victim) caches. For example, if read-only L1 instruction caches are not sending clean writebacks, there will never be any spills from the L1 to the L2. At the moment the cache model defaults to not sending clean writebacks, and this should possibly be re-evaluated. The implementation of clean writebacks relies on a new packet command WritebackClean, which acts much like a Writeback (renamed WritebackDirty), and also much like a CleanEvict. On eviction of a clean block the cache either sends a clean evict, or a clean writeback, and if any copies are still cached upstream the clean evict/writeback is dropped. Similarly, if a clean evict/writeback reaches a cache where there are outstanding MSHRs for the block, the packet is dropped. In the typical case though, the clean writeback allocates a block in the downstream cache, and marks it writable if the evicted block was writable. The patch changes the O3_ARM_v7a L1 cache configuration and the default L1 caches in config/common/Caches.py
2015-11-06	mem: Add cache clusivity	Andreas Hansson
	This patch adds a parameter to control the cache clusivity, that is if the cache is mostly inclusive or exclusive. At the moment there is no intention to support strict policies, and thus the options are: 1) mostly inclusive, or 2) mostly exclusive. The choice of policy guides the behaviuor on a cache fill, and a new helper function, allocOnFill, is created to encapsulate the decision making process. For the timing mode, the decision is annotated on the MSHR on sending out the downstream packet, and in atomic we directly pass the decision to handleFill. We (ab)use the tempBlock in cases where we are not allocating on fill, leaving the rest of the cache unaffected. Simple and effective. This patch also makes it more explicit that multiple caches are allowed to consider a block writable (this is the case also before this patch). That is, for a mostly inclusive cache, multiple caches upstream may also consider the block exclusive. The caches considering the block writable/exclusive all appear along the same path to memory, and from a coherency protocol point of view it works due to the fact that we always snoop upwards in zero time before querying any downstream cache. Note that this patch does not introduce clean writebacks. Thus, for clean lines we are essentially removing a cache level if it is made mostly exclusive. For example, lines from the read-only L1 instruction cache or table-walker cache are always clean, and simply get dropped rather than being passed to the L2. If the L2 is mostly exclusive and does not allocate on fill it will thus never hold the line. A follow on patch adds the clean writebacks. The patch changes the L2 of the O3_ARM_v7a CPU configuration to be mostly exclusive (and stats are affected accordingly).
2015-11-06	mem: Avoid unnecessary snoops on writebacks and clean evictions	Ali Jafri
	This patch optimises the handling of writebacks and clean evictions when using a snoop filter. Instead of snooping into the caches to determine if the block is cached or not, simply set the status based on the snoop-filter result.
2015-11-06	mem: Order packet queue only on matching addresses	Andreas Hansson
	Instead of conservatively enforcing order for all packets, which may negatively impact the simulated-system performance, this patch updates the packet queue such that it only applies the restriction if there are already packets with the same address in the queue. The basic need for the order enforcement is due to coherency interactions where requests/responses to the same cache line must not over-take each other. We rely on the fact that any packet that needs order enforcement will have a block-aligned address. Thus, there is no need for the queue to know about the cacheline size.
2015-11-06	mem: Enforce insertion order on the cache response path	Ali Jafri
	This patch enforces insertion order transmission of packets on the response path in the cache. Note that the logic to enforce order is already present in the packet queue, this patch simply turns it on for queues in the response path. Without this patch, there are corner cases where a request-response is faster than a response-response forwarded through the cache. This violation of queuing order causes problems in the snoop filter leaving it with inaccurate information. This causes assert failures in the snoop filter later on. A follow on patch relaxes the order enforcement in the packet queue to limit the performance impact.
2015-11-06	mem: Use the packet delays and do not just zero them out	Andreas Hansson
	This patch updates the I/O devices, bridge and simple memory to take the packet header and payload delay into account in their latency calculations. In all cases we add the header delay, i.e. the accumulated pipeline delay of any crossbars, and the payload delay needed for deserialisation of any payload. Due to the additional unknown latency contribution, the packet queue of the simple memory is changed to use insertion sorting based on the time stamp. Moreover, since the memory hands out exclusive (non shared) responses, we also need to ensure ordering for reads to the same address.
2015-11-06	mem: Align rules for sinking inhibited packets at the slave	Andreas Hansson
	This patch aligns how the memory-system slaves, i.e. the various memory controllers and the bridge, identify and deal with sinking of inhibited packets that are only useful within the coherent part of the memory system. In the future we could shift the onus to the crossbar, and add a parameter "is_point_of_coherence" that would allow it to sink the aforementioned packets.
2015-11-06	mem: Do not treat CleanEvict as a write operation	Andreas Hansson
	This patch changes the CleanEvict command type to not be considered a write. Initially it was made a zero-sized write to match the writeback command, but as things developed it became clear that it causes more problems than it solves. For example, the memory modules (and bridge) should not consider the CleanEvict as a write, but instead discard it. With this patch it will be neither a read, nor write, and as it does not need a response the slave will simply sink it.
2015-11-06	mem: Unify delayed packet deletion	Andreas Hansson
	This patch unifies how we deal with delayed packet deletion, where the receiving slave is responsible for deleting the packet, but the sending agent (e.g. a cache) is still relying on the pointer until the call to sendTimingReq completes. Previously we used a mix of a deletion vector and a construct using unique_ptr. With this patch we ensure all slaves use the latter approach.
2015-11-06	misc: Appease clang static analyzer	Andreas Hansson
	A few minor fixes to issues identified by the clang static analyzer.
2015-11-06	mem: Check the XBar's port queues on functional snoops	Andreas Sandberg
	The CoherentXBar currently doesn't check its queued slave ports when receiving a functional snoop. This caused data corruption in cases when a modified cache lines is forwarded between two caches. Add the required functional calls into the queued slave ports.
2015-11-03	mem: hmc: minor fixes	Erfan Azarkhish
	This patch performs two minor fixes to DRAMCtrl.py and xbar.hh in favor of the HMC patch series. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03	mem: hmc: serial link model	Erfan Azarkhish
	This changeset adds a serial link model for the Hybrid Memory Cube (HMC). SerialLink is a simple variation of the Bridge class, with the ability to account for the latency of packet serialization. Also trySendTiming has been modified to correctly model bandwidth. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03	mem: hmc: adds controller	Erfan Azarkhish
	This patch models a simple HMC Controller. It simply schedules the incoming packets to HMC Serial Links using a round robin mechanism. This patch should be applied in series with other patches modeling a complete HMC device. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-10-29	arm: Add secure flag to TableWalker request when needed	Nathanael Premillieu

2015-10-29	dev: Fix segfault in flash device	Sascha Bischoff
	Fix a bug in which the flash device would write out of bounds and could either trigger a segfault and corrupt the memory of other objects. This was caused by using pageSize in the place of pagesPerBlock when running the garbage collector. Also, added an assert to flag this condition in the future.
2015-10-29	dev: Fix draining for UFSHostDevice and FlashDevice	Sascha Bischoff
	This patch fixes the drain logic for the UFSHostDevice and the FlashDevice. In the case of the FlashDevice, the logic for CheckDrain needed to be reversed, whilst in the case of the UFSHostDevice check drain was never being called. In both cases the system would never complete draining if the initial attampt to drain failed.
2015-10-29	kvm, arm: Fix compilation errors due to API changes	Victor Garcia
	The checkpoint changes, along with the SMT patches have changed a number of APIs. Adapt the ArmKvmCPU accordingly.
2015-10-29	mem: Clarify cache MSHR handling on fill	Andreas Hansson
	This patch addresses the upgrading of deferred targets in the MSHR, and makes it clearer by explicitly calling out what is happening (deferred targets are promoted if we get exclusivity without asking for it).
2015-10-25	power: Implement Remote GDB	Boris Shingarov

2015-10-23	x86: Add missing explicit overrides for X86 devices	Andreas Hansson
	Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
2015-10-23	arm: Add missing explicit overrides for ARM devices	Andreas Hansson
	Make clang >= 3.5 happy when compiling build/ARM/gem5.opt on OSX.
2015-10-14	mem: Pass snoop retries through the CommMonitor	Andreas Hansson
	Allow the monitor to be placed after a snooping port, and do not fail on snoop retries, but instead pass them on to the slave port.
2015-10-14	ruby: profiler: provide the number of vnets through ruby system	Nilay Vaish
	The aim is to ultimately do away with the static function Network::getNumberOfVirtualNetworks().
2015-10-14	ruby: remove unused functionalRead() function.	Nilay Vaish
	Not required since functional reads cannot rely on messages that are inflight.
2015-10-14	ruby: garnet: flexible: refactor flit	Nilay Vaish

2015-10-12	misc: Add explicit overrides and fix other clang >= 3.5 issues	Andreas Hansson
	This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables).
2015-10-12	misc: Remove redundant compiler-specific defines	Andreas Hansson
	This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions.
2015-10-10	sim: Don't quiesce UDelayEvents with 0 latency	Joel Hestness
	ARM uses UDelayEvents to emulate kernel __udelay functions and speed up simulation. UDelayEvents call Pseudoinst::quiesceNs to quiesce the system for a specified delay. Changeset 10341:0b4d10f53c2d introduced the requirement that any quiesce process that is started must also be completed by scheduling an EndQuiesceEvent. This change causes the CPU to hang if an IsQuiesce instruction is executed, but the corresponding EndQuiesceEvent is not scheduled. Changeset 11058:d0934b57735a introduces a fix for uses of PseudoInst::quiesce that would conditionally execute the EndQuiesceEvent. ARM UDelayEvents specify quiesce period of 0 ns (src/arch/arm/linux/system.cc), so changeset 11058 causes these events to now execute full quiesce processes, greatly increasing the total instructions executed in kernel delay loops and slowing simulation. This patch updates the UDelayEvent to conditionally execute PseudoInst::quiesceNs (a quiesce operation) only if the specified delay is >0 ns. The result is ARM delay loops no longer execute instructions for quiesce handling, and regression time returns to normal.
2015-10-09	isa: Add parameter to pick different decoder inside ISA	Rekai Gonzalez Alberquilla
	The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time.
2015-10-09	sim: Add relative break scheduling	Dylan Johnson
	Add schedRelBreak() function, executable within a debugger, that sets a breakpoint by relative rather than absolute tick.
2015-10-06	arch: clean up isa_parser error handling	Steve Reinhardt
	Although some decent error messages were getting generated inside isa_parser.py, they weren't always getting printed because of the screwy way we were handling exceptions. (Basically an inner exception would get hidden by an outer exception, and the more informative inner error message would not get printed.) Also line numbers were messed up, since they were taken from the lexer, which is typically a token (or more) ahead of the grammar rule that's being matched. Using the 'lineno' attribute that PLY associates with the grammar production is more accurate. The new LineTracker class extends lineno to track filenames as well as line numbers.
2015-10-06	sim: add ExecMacro to Exec* compound debug flags	Steve Reinhardt
	Really should have been there in the first place, IMO. Makes debugging x86 execution a lot easier.
2015-10-06	sim: print pid in output header	Steve Reinhardt
	This information is useful if you have a bunch of simulations running and want to know which one to kill, but you've neglected to take advantage of the previous patch and embed the pid in your output path.
2015-10-06	x86: implement rcpps and rcpss SSE insts	Steve Reinhardt
	These are packed single-precision approximate reciprocal operations, vector and scalar versions, respectively. This code was basically developed by copying the code for sqrtps and sqrtss. The mrcp micro-op was simplified relative to msqrt since there are no double-precision versions of this operation.
2015-10-06	x86: implement fild, fucomi, and fucomip x87 insts	Steve Reinhardt
	fild loads an integer value into the x87 top of stack register. fucomi/fucomip compare two x87 register values (the latter also doing a stack pop). These instructions are used by some versions of GNU libstdc++.
2015-09-02	sim: Add ability to break at specific kernel function	Dylan Johnson
	Adds a GDB callable function that sets a breakpoint at the beginning of a kernel function.