gem5 - gem5

Age	Commit message (Collapse)	Author
2014-05-13	mem: Refactor assignment of Packet types	Curtis Dunham
	Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function.
2014-09-03	cpu: Fix o3 drain bug	Mitch Hayenga
	For X86, the o3 CPU would get stuck with the commit stage not being drained if an interrupt arrived while drain was pending. isDrained() makes sure that pcState.microPC() == 0, thus ensuring that we are at an instruction boundary. However, when we take an interrupt we execute: pcState.upc(romMicroPC(entry)); pcState.nupc(romMicroPC(entry) + 1); tc->pcState(pcState); As a result, the MicroPC is no longer zero. This patch ensures the drain is delayed until no interrupts are present. Once draining, non-synchronous interrupts are deffered until after the switch.
2014-04-29	arm: use condition code registers for ARM ISA	Curtis Dunham
	Analogous to ee049bf (for x86). Requires a bump of the checkpoint version and corresponding upgrader code to move the condition code register values to the new register file.
2014-09-03	cpu: fix bimodal predictor to use correct global history reg	Dam Sunwoo
	A small bug in the bimodal predictor caused significant degradation in performance on some benchmarks. This was caused by using the wrong globalHistoryReg during the update phase. This patches fixes the bug and brings the performance to normal level.
2014-09-03	cpu: Fix cache blocked load behavior in o3 cpu	Mitch Hayenga
	This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior.
2014-09-03	cpu: Fix o3 quiesce fetch bug	Mitch Hayenga
	O3 is supposed to stop fetching instructions once a quiesce is encountered. However due to a bug, it would continue fetching instructions from the current fetch buffer. This is because of a break statment that only broke out of the first of 2 nested loops. It should have broken out of both.
2014-09-03	cpu: Fix SMT scheduling issue with the O3 cpu	Mitch Hayenga
	The o3 cpu could attempt to schedule inactive threads under round-robin SMT mode. This is because it maintained an independent priority list of threads from the active thread list. This priority list could be come stale once threads were inactive, leading to the cpu trying to fetch/commit from inactive threads. Additionally the fetch queue is now forcibly flushed of instrctuctions from the de-scheduled thread. Relevant output: 24557000: system.cpu: [tid:1]: Calling deactivate thread. 24557000: system.cpu: [tid:1]: Removing from active threads list 24557500: system.cpu: FullO3CPU: Ticking main, FullO3CPU. 24557500: system.cpu.fetch: Running stage. 24557500: system.cpu.fetch: Attempting to fetch from [tid:1]
2014-09-03	cpu: Fix incorrect speculative branch predictor behavior	Mitch Hayenga
	When a branch mispredicted gem5 would squash all history after and including the mispredicted branch. However, the mispredicted branch is still speculative and its history is required to rollback state if another, older, branch mispredicts. This leads to things like RAS corruption.
2014-09-03	cpu: Add a fetch queue to the o3 cpu	Mitch Hayenga
	This patch adds a fetch queue that sits between fetch and decode to the o3 cpu. This effectively decouples fetch from decode stalls allowing it to be more aggressive, running futher ahead in the instruction stream.
2014-09-03	cpu: Fix o3 front-end pipeline interlock behavior	Mitch Hayenga
	The o3 pipeline interlock/stall logic is incorrect. o3 unnecessicarily stalled fetch and decode due to later stages in the pipeline. In general, a stage should usually only consider if it is stalled by the adjacent, downstream stage. Forcing stalls due to later stages creates and results in bubbles in the pipeline. Additionally, o3 stalled the entire frontend (fetch, decode, rename) on a branch mispredict while the ROB is being serially walked to update the RAT (robSquashing). Only should have stalled at rename.
2014-09-03	cpu: Change writeback modeling for outstanding instructions	Mitch Hayenga
	As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit().
2014-09-03	arch, cpu: Factor out the ExecContext into a proper base class	Andreas Sandberg
	We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models. The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%.
2014-09-01	mem: change the namespace Message to ProtoMessage	Nilay Vaish
	The namespace Message conflicts with the Message data type used extensively in Ruby. Since Ruby is being moved to the same Master/Slave ports based configuration style as the rest of gem5, this conflict needs to be resolved. Hence, the namespace is being renamed to ProtoMessage.
2014-09-01	ruby: eliminate type Time	Nilay Vaish
	There is another type Time in src/base class which results in a conflict.
2014-08-13	scons: Build the branch predictor for all CPUs	Andreas Sandberg
	The branch predictor is normally only built when a CPU that uses a branch predictor is built. The list of CPUs is currently incomplete as the simple CPUs support branch predictors (for warming, branch stats, etc). In practice, all CPU models now use branch predictors, so this changeset removes the CPU model check and replaces it with a check for the NULL ISA.
2014-08-13	cpu: Don't forward declare RefCountingPtr	Andreas Sandberg
	RefCountingPtr is sometimes forward declared to avoid having to include refcnt.hh. This does not work since we typically return instances of RefCountingPtr rather than references to instances. The only reason this currently works is that we include refcnt.hh in cprintf.hh, which "leaks" the header to most other source files. This changeset replaces such forward declarations with an include of refcnt.hh.
2014-08-13	cpu: Modernise the branch predictor (STL and C++11)	Andreas Hansson
	This patch does some minor house keeping of the branch predictor by adopting STL containers, and shifting some iterator to use range-based for loops. The predictor history is also changed from a list to a deque as we never to insertion/deletion other than at the front and back.
2014-08-10	cpu: Ensure the traffic generator suppresses non-memory packets	Andreas Hansson
	This patch adds a check to ensure that packets which are not going to a memory range are suppressed in the traffic generator. Thus, if a trace is collected in full-system, the packets destined for devices are not played back.
2014-07-23	cpu: `Minor' in-order CPU model	Andrew Bardsley
	This patch contains a new CPU model named `Minor'. Minor models a four stage in-order execution pipeline (fetch lines, decompose into macroops, decompose macroops into microops, execute). The model was developed to support the ARM ISA but should be fixable to support all the remaining gem5 ISAs. It currently also works for Alpha, and regressions are included for ARM and Alpha (including Linux boot). Documentation for the model can be found in src/doc/inside-minor.doxygen and its internal operations can be visualised using the Minorview tool utils/minorview.py. Minor was designed to be fairly simple and not to engage in a lot of instruction annotation. As such, it currently has very few gathered stats and may lack other gem5 features. Minor is faster than the o3 model. Sample results: Benchmark \| Stat host_seconds (s) ---------------+--------v--------v-------- (on ARM, opt) \| simple \| o3 \| minor \| timing \| timing \| timing ---------------+--------+--------+-------- 10.linux-boot \| 169 \| 1883 \| 1075 10.mcf \| 117 \| 967 \| 491 20.parser \| 668 \| 6315 \| 3146 30.eon \| 542 \| 3413 \| 2414 40.perlbmk \| 2339 \| 20905 \| 11532 50.vortex \| 122 \| 1094 \| 588 60.bzip2 \| 2045 \| 18061 \| 9662 70.twolf \| 207 \| 2736 \| 1036
2014-06-30	cpu: implement a bi-mode branch predictor	Anthony Gutierrez

2014-06-21	o3: make dispatch LSQ full check more selective	Binh Pham
	Dispatch should not check LSQ size/LSQ stall for non load/store instructions. This work was done while Binh was an intern at AMD Research.
2014-06-21	o3: split load & store queue full cases in rename	Binh Pham
	Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa. This work was done while Binh was an intern at AMD Research.
2014-05-31	style: eliminate equality tests with true and false	Steve Reinhardt
	Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'. It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up. Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code.
2014-05-23	cpu: o3: remove stat totalCommittedInsts	Nilay Vaish
	This patch removes the stat totalCommittedInsts. This variable was used for recording the total number of instructions committed across all the threads of a core. The instructions committed by each thread are recorded invidually. The total would now be generated by summing these individual counts.
2014-05-09	cpu: Useful getters for ActivityRecorder	Andrew Bardsley
	Add some useful getters to ActivityRecorder
2014-05-09	cpu: Add flag name printing to StaticInst	Andrew Bardsley
	This patch adds a the member function StaticInst::printFlags to allow all of an instruction's flags to be printed without using the individual is... member functions or resorting to exposing the 'flags' vector It also replaces the enum definition StaticInst::Flags with a Python-generated enumeration and adds to the enum generation mechanism in src/python/m5/params.py to allow Enums to be placed in namespaces other than Enums or, alternatively, in wrapper structs allowing them to be inherited by other classes (so populating that class's name-space with the enumeration element names).
2014-05-09	cpu: Timebuf const accessors	Andrew Bardsley
	Add const accessors for timebuf elements.
2014-05-09	arch, arm: Preserve TLB bootUncacheability when switching CPUs	Geoffrey Blake
	The ARM TLBs have a bootUncacheability flag used to make some loads and stores become uncacheable when booting in FS mode. Later the flag is cleared to let those loads and stores operate as normal. When doing a takeOverFrom(), this flag's state is not preserved and is momentarily reset until the CPSR is touched. On single core runs this is a non-issue. On multi-core runs this can lead to crashes on the O3 CPU model from the following series of events: 1) takeOverFrom executed to switch from Atomic -> O3 2) All bootUncacheability flags are reset to true 3) Core2 tries to execute a load covered by bootUncacheability, it is flagged as uncacheable 4) Core2's load needs to replay due to a pipeline flush 3) Core1 core does an action on CPSR 4) The handling code for CPSR then checks all other cores to determine if bootUncacheability can be set to false 5) Asynchronously set bootUncacheability on all cores to false 6) Core2 replays load previously set as uncacheable and notices it is now flagged as cacheable, leads to a panic. This patch implements takeOverFrom() functionality for the ARM TLBs to preserve flag values when switching from atomic -> detailed.
2014-05-09	cpu: add more instruction mix statistics	Curtis Dunham
	For the o3, add instruction mix (OpClass) histogram at commit (stats also already collected at issue). For the simple CPUs we add a histogram of executed instructions
2014-05-09	cpu, arm: Allow the specification of a socket field	Akash Bagdia
	Allow the specification of a socket ID for every core that is reflected in the MPIDR field in ARM systems. This allows studying multi-socket / cluster systems with ARM CPUs.
2014-04-23	cpu: Fix setTranslateLatency() bug for squashed instructions	Mitchell Hayenga
	setTranslateLatency could sometimes improperly access a deleted request packet after an instruction was squashed.
2014-04-01	cpu: Fix case where o3 lsq could print out uninitialized data	Mitch Hayenga
	In the O3 LSQ, data read/written is printed out in DPRINTFs. However, the data field is treated as a character string with a null terminated. However the data field is not encoded this way. This patch removes that possibility by removing the data part of the print.
2014-04-23	cpu: Add O3 CPU width checks	Dam Sunwoo
	O3CPU has a compile-time maximum width set in o3/impl.hh, but checking the configuration against this limit was not implemented anywhere except for fetch. Configuring a wider pipe than the limit can silently cause various issues during the simulation. This patch adds the proper checking in the constructor of the various pipeline stages.
2014-04-19	o3: Fix occupancy checks for SMT	Faissal Sleiman
	A number of calls to isEmpty() and numFreeEntries() should be thread-specific. In cpu.cc, the fact that tid is /commented/ out is a bug. Say the rob has instructions from thread 0 (isEmpty() returns false), and none from thread 1. If we are trying to squash all of thread 1, then readTailInst(thread 1) will be called because rob->isEmpty() returns false. The result is end_it is not in the list and the while statement loops indefinitely back over the cpu's instList. In iew_impl.hh, all threads are told they have the entire remaining IQ, when each thread actually has a certain allocation. The result is extra stalls at the iew dispatch stage which the rename stage usually takes care of. In commit_impl.hh, rob->readHeadInst(thread 1) can be called if the rob only contains instructions from thread 0. This returns a dummyInst (which may work since we are trying to squash all instructions, but hardly seems like the right way to do it). In rob_impl.hh this fix skips the rest of the function more frequently and is more efficient. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-04-09	kvm, x86: Add initial support for multicore simulation	Andreas Sandberg
	Simulating a SMP or multicore requires devices to be shared between multiple KVM vCPUs. This means that locking is required when accessing devices. This changeset adds the necessary locking to allow devices to execute correctly. It is implemented by temporarily migrating the KVM CPU to the VM's (and devices) event queue when handling MMIO. Similarly, the VM migrates to the interrupt controller's event queue when delivering an interrupt. The support for fast-forwarding of multicore simulations added by this changeset assumes that all devices in a system are simulated in the same thread and each vCPU has its own thread. Special care must be taken to ensure that devices living under the CPU in the object hierarchy (e.g., the interrupt controller) do not inherit the parent CPUs thread and are assigned to device thread. The KvmVM object is assumed to live in the same thread as the other devices in the system.
2014-03-25	cpu: o3: lsq: Fix TSO implementation	Marco Elver
	This patch fixes violation of TSO in the O3CPU, as all loads must be ordered with all other loads. In the LQ, if a snoop is observed, all subsequent loads need to be squashed if the system is TSO. Prior to this patch, the following case could be violated: P0 \| P1 ; MOV [x],mail=/usr/spool/mail/nilay \| MOV EAX,[y] ; MOV [y],mail=/usr/spool/mail/nilay \| MOV EBX,[x] ; exists (1:EAX=1 /\ 1:EBX=0) [is a violation] The problem was found using litmus [http://diy.inria.fr]. Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-03-23	cpu: DRAM Traffic Generator	Neha Agarwal
	This patch enables a new 'DRAM' mode to the existing traffic generator, catered to generate specific requests to DRAM based on required hit length (stride size) and bank utilization. It is an add on to the Random mode. The basic idea is to control how many successive packets target the same page, and how many banks are being used in parallel. This gives a two-dimensional space that stresses different aspects of the DRAM timing. The configuration file needed to use this patch has to be changed as follow: (reference to Random Mode, LPDDR3 memory type) 'STATE 0 10000000000 RANDOM 50 0 134217728 64 3004 5002 0' -> 'STATE 0 10000000000 DRAM 50 0 134217728 32 3004 5002 0 96 1024 8 6 1' The last 4 parameters to be added are: <stride size (bytes), page size(bytes), number of banks available in DRAM, number of banks to be utilized, address mapping scheme> The address mapping information is used to get the stride address stream of the specified size and to know where to find the bank bits. The configuration file has a parameter where '0'-> RoCoRaBaCh, '1'-> RoRaBaCoCh/RoRaBaChCo address-mapping schemes. Note that the generator currently assumes a single channel and a single rank. This is to avoid overwhelming the traffic generator with information about the memory organisation.
2014-03-23	cpu: Add basic check to TrafficGen initial state	Stan Czerniawski
	Prevent incomplete configuration of TrafficGen class from causing segmentation faults. If an 'INIT' line is not present in the configuration file then the currState variable will remain uninitialized which may result in a crash.
2014-03-16	kvm: Clean up signal handling	Andreas Sandberg
	KVM used to use two signals, one for instruction count exits and one for timer exits. There is really no need to distinguish between the two since they only trigger exits from KVM. This changeset unifies and renames the signals and adds a method, kick(), that can be used to raise the control signal in the vCPU thread. It also removes the early timer warning since we do not normally see if the signal was delivered. --HG-- extra : rebase_source : cd0e45ca90894c3d6f6aa115b9b06a1d8f0fda4d
2014-03-16	kvm: x86: Adjust PC to remove the CS segment base address	Andreas Sandberg
	gem5 seems to store the PC as RIP+CS_BASE. This is not what KVM expects, so we need to subtract CS_BASE prior to transferring the PC into KVM. This changeset adds the necessary PC manipulation and refactors thread context updates slightly to avoid reading registers multiple times from KVM. --HG-- extra : rebase_source : 3f0569dca06a1fcd8694925f75c8918d954ada44
2014-03-16	kvm: x86: Add support for x86 INIT and STARTUP handling	Andreas Sandberg
	This changeset adds support for INIT and STARTUP IPI handling. We currently handle both of these interrupts in gem5 and transfer the state to KVM. Since we do not have a BIOS loaded, we pretend that the INIT interrupt suspends the CPU after reset. --HG-- extra : rebase_source : 7f3b25f3801d68f668b6cd91eaf50d6f48ee2a6a
2014-03-12	alpha: Small removal of dead comments/code from alpha ISA	Paul Rosenfeld
	Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-03-07	cpu: Make CPU and ThreadContext getters const	Andreas Hansson
	This patch merely tidies up the CPU and ThreadContext getters by making them const where appropriate.
2014-03-07	scons: Fixes uninitialized warnings issued by clang	Mitch Hayenga
	Small fixes to appease recent clang versions.
2014-03-03	kvm: x86: Always assume segments to be usable	Andreas Sandberg
	When transferring segment registers into kvm, we need to find the value of the unusable bit. We used to assume that this could be inferred from the selector since segments are generally unusable if their selector is 0. This assumption breaks in some weird corner cases. Instead, we just assume that segments are always usable. This is what qemu does so it should work.
2014-03-03	kvm: Initialize signal handlers from startupThread()	Andreas Sandberg
	Signal handlers in KVM are controlled per thread and should be initialized from the thread that is going to execute the CPU. This changeset moves the initialization call from startup() to startupThread().
2014-03-01	cpu: Enable fast-forwarding for MIPS InOrderCPU and O3CPU	Christopher Torng
	A copyRegs() function is added to MIPS utilities to copy architectural state from the old CPU to the new CPU during fast-forwarding. This addition alone enables fast-forwarding for the o3 cpu model running MIPS. The patch also adds takeOverFrom() and drainResume() functions to the InOrderCPU to enable it to take over from another CPU. This change enables fast-forwarding for the inorder cpu model running MIPS, but not for Alpha. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-02-20	kvm: Add support for multi-system simulation	Andreas Sandberg
	The introduction of parallel event queues added most of the support needed to run multiple VMs (systems) within the same gem5 instance. This changeset fixes up signal delivery so that KVM's control signals are delivered to the thread that executes the CPU's event queue. Specifically: * Timers and counters are now initialized from a separate method (startupThread) that is scheduled as the first event in the thread-specific event queue. This ensures that they are initialized from the thread that is going to execute the CPUs event queue and enables signal delivery to the right thread when exiting from KVM. * The POSIX-timer-based KVM timer (used to force exits from KVM) has been updated to deliver signals to the thread that's executing KVM instead of the process (thread is undefined in that case). This assumes that the timer is instantiated from the thread that is going to execute the KVM vCPU. * Signal masking is now done using pthread_sigmask instead of sigprocmask. The behavior of the latter is undefined in threaded applications. * Since signal masks can be inherited, make sure to actively unmask the control signals when setting up the KVM signal mask. There are currently no facilities to multiplex between multiple KVM CPUs in the same event queue, we are therefore limited to configurations where there is only one KVM CPU per event queue. In practice, this means that multi-system configurations can be simulated, but not multiple CPUs in a shared-memory configuration.
2014-02-09	cpu: simple: Add support for using branch predictors	Andreas Sandberg
	This changesets adds branch predictor support to the BaseSimpleCPU. The simple CPUs normally don't need a branch predictor, however, there are at least two cases where it can be desirable: 1) A simple CPU can be used to warm the branch predictor of an O3 CPU before switching to the slower O3 model. 2) The simple CPU can be used as a quick way of evaluating/debugging new branch predictors since it exposes branch predictor statistics. Limitations: * Since the simple CPU doesn't speculate, only one instruction will be active in the branch predictor at a time (i.e., the branch predictor will never see speculative branches). * The outcome of a branch prediction does not affect the performance of the simple CPU.
2014-01-29	cpu: fix bug when TrafficGen deschedules event	Xiangyu Dong
	Committed by: Nilay Vaish <nilay@cs.wisc.edu>