gem5 - gem5

Age	Commit message (Collapse)	Author
2014-09-20	alpha,arm,mips,power,x86,cpu,sim: Cleanup activate/deactivate	Mitch Hayenga
	activate(), suspend(), and halt() used on thread contexts had an optional delay parameter. However this parameter was often ignored. Also, when used, the delay was seemily arbitrarily set to 0 or 1 cycle (no other delays were ever specified). This patch removes the delay parameter and 'Events' associated with them across all ISAs and cores. Unused activate logic is also removed.
2014-09-20	cpu: use probes infrastructure to do simpoint profiling	Dam Sunwoo
	Instead of having code embedded in cpu model to do simpoint profiling use the probes infrastructure to do it.
2014-09-19	arch: Pass faults by const reference where possible	Andreas Hansson
	This patch changes how faults are passed between methods in an attempt to copy as few reference-counting pointer instances as possible. This should avoid unecessary copies being created, contributing to the increment/decrement of the reference counters.
2014-05-13	mem: Refactor assignment of Packet types	Curtis Dunham
	Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function.
2014-09-03	arch, cpu: Factor out the ExecContext into a proper base class	Andreas Sandberg
	We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models. The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%.
2014-05-09	cpu: add more instruction mix statistics	Curtis Dunham
	For the o3, add instruction mix (OpClass) histogram at commit (stats also already collected at issue). For the simple CPUs we add a histogram of executed instructions
2014-02-09	cpu: simple: Add support for using branch predictors	Andreas Sandberg
	This changesets adds branch predictor support to the BaseSimpleCPU. The simple CPUs normally don't need a branch predictor, however, there are at least two cases where it can be desirable: 1) A simple CPU can be used to warm the branch predictor of an O3 CPU before switching to the slower O3 model. 2) The simple CPU can be used as a quick way of evaluating/debugging new branch predictors since it exposes branch predictor statistics. Limitations: * Since the simple CPU doesn't speculate, only one instruction will be active in the branch predictor at a time (i.e., the branch predictor will never see speculative branches). * The outcome of a branch prediction does not affect the performance of the simple CPU.
2014-01-24	cpu: Add support for instructions that zero cache lines.	Ali Saidi

2014-01-24	cpu: Add CPU support for generatig wake up events when LLSC adresses are ↵	Ali Saidi
	snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.
2014-01-24	mem: per-thread cache occupancy and per-block ages	Dam Sunwoo
	This patch enables tracking of cache occupancy per thread along with ages (in buckets) per cache blocks. Cache occupancy stats are recalculated on each stat dump.
2014-01-24	mem: track per-request latencies and access depths in the cache hierarchy	Matt Horsnell
	Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.
2014-01-24	cpu: remove faulty simpoint basic block inst count assertion	Dam Sunwoo
	This patch removes an assertion in the simpoint profiling code that asserts that a previously-seen basic block has the exact same number of instructions executed as before. This can be false if the basic block generates aborts or takes interrupts at different locations within the basic block. The basic block profiling are not affected significantly as these events are rare in general.
2013-10-15	cpu: add a condition-code register class	Yasuko Eckert
	Add a third register class for condition codes, in parallel with the integer and FP classes. No ISAs use the CC class at this point though.
2013-10-15	cpu: rename _DepTag constants to _Reg_Base	Steve Reinhardt
	Make these names more meaningful. Specifically, made these substitutions: s/FP_Base_DepTag/FP_Reg_Base/g; s/Ctrl_Base_DepTag/Misc_Reg_Base/g; s/Max_DepTag/Max_Reg_Index/g;
2013-08-20	cpu: Fix timing CPU isDrained comment formatting	Andreas Hansson
	This patch fixes up the comment formatting for isDrained in the timing CPU.
2013-08-19	cpu: Accurately count idle cycles for simple cpu	Lena Olson
	Added a couple missing updates to the notIdleFraction stat. Without these, it sometimes gives a (not) idle fraction that is greater than 1 or less than 0.
2013-08-19	cpu: Fix timing CPU drain check	Andreas Hansson
	This patch modifies the SimpleTimingCPU drain check to also consider the fetch event. Previously, there was an assumption that there is never a fetch event scheduled if the CPU is not executing microcode. However, when a context is activated, a fetch even is scheduled, and microPC() is zero.
2013-07-18	mem: Set the cache line size on a system level	Andreas Hansson
	This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly.
2013-05-30	cpu: Make hash struct instead of class to please clang	Andreas Hansson
	This patch changes the type of the hash function for BasicBlockRanges to match the original definition of the templatized type. Without this, clang raises a warning and combined with the "-Werror" flag this causes compilation to fail.
2013-04-22	sim: separate nextCycle() and clockEdge() in clockedObjects	Dam Sunwoo
	Previously, nextCycle() could return the current cycle if the current tick was already aligned with the clock edge. This behavior is not only confusing (not quite what the function name implies), but also caused problems in the drainResume() function. When exiting/re-entering the sim loop (e.g., to take checkpoints), the CPUs will drain and resume. Due to the previous behavior of nextCycle(), the CPU tick events were being rescheduled in the same ticks that were already processed before draining. This caused divergence from runs that did not exit/re-entered the sim loop. (Initially a cycle difference, but a significant impact later on.) This patch separates out the two behaviors (nextCycle() and clockEdge()), uses nextCycle() in drainResume, and uses clockEdge() everywhere else. Nothing (other than name) should change except for the drainResume timing.
2013-04-22	cpu: generate SimPoint basic block vector profiles	Dam Sunwoo
	This patch is based on http://reviews.m5sim.org/r/1474/ originally written by Mitch Hayenga. Basic block vectors are generated (simpoint.bb.gz in simout folder) based on start and end addresses of basic blocks. Some comments to the original patch are addressed and hooks are added to create and resume from checkpoints based on instruction counts dictated by external SimPoint analysis tools. SimPoint creation/resuming options will be implemented as a separate patch.
2013-03-26	cpu: Remove CpuPort and use MasterPort in the CPU classes	Andreas Hansson
	This patch changes the port in the CPU classes to use MasterPort instead of the derived CpuPort. The functions of the CpuPort are now distributed across the relevant subclasses. The port accessor functions (getInstPort and getDataPort) now return a MasterPort instead of a CpuPort. This simplifies creating derivative CPUs that do not use the CpuPort.
2013-02-15	sim: Add a system-global option to bypass caches	Andreas Sandberg
	Virtualized CPUs and the fastmem mode of the atomic CPU require direct access to physical memory. We currently require caches to be disabled when using them to prevent chaos. This is not ideal when switching between hardware virutalized CPUs and other CPU models as it would require a configuration change on each switch. This changeset introduces a new version of the atomic memory mode, 'atomic_noncaching', where memory accesses are inserted into the memory system as atomic accesses, but bypass caches. To make memory mode tests cleaner, the following methods are added to the System class: * isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'. * isTimingMode() -- True if the memory mode is 'timing'. * bypassCaches() -- True if caches should be bypassed. The old getMemoryMode() and setMemoryMode() methods should never be used from the C++ world anymore.
2013-02-15	cpu: Refactor memory system checks	Andreas Sandberg
	CPUs need to test that the memory system is in the right mode in two places, when the CPU is initialized (unless it's switched out) and on a drainResume(). This led to some code duplication in the CPU models. This changeset introduces the verifyMemoryMode() method which is called by BaseCPU::init() if the CPU isn't switched out. The individual CPU models are responsible for calling this method when resuming from a drain as this code is CPU model specific.
2013-02-15	cpu: Add CPU metadata om the Python classes	Andreas Sandberg
	The configuration scripts currently hard-code the requirements of each CPU. This is clearly not optimal as it makes writing new configuration scripts painful and adding new CPU models requires existing scripts to be updated. This patch adds the following class methods to the base CPU and all relevant CPUs: * memory_mode -- Return a string describing the current memory mode (invalid/atomic/timing). * require_caches -- Does the CPU model require caches? * support_take_over -- Does the CPU support CPU handover?
2013-01-12	base simple cpu: removes commented out code about cache ops	Nilay Vaish

2013-01-12	x86: Changes to decoder, corrects 9376	Nilay Vaish
	The changes made by the changeset 9376 were not quite correct. The patch made changes to the code which resulted in decoder not getting initialized correctly when the state was restored from a checkpoint. This patch adds a startup function to each ISA object. For x86, this function sets the required state in the decoder. For other ISAs, the function is empty right now.
2013-01-07	cpu: Unify the serialization code for all of the CPU models	Andreas Sandberg
	Cleanup the serialization code for the simple CPUs and the O3 CPU. The CPU-specific code has been replaced with a (un)serializeThread that serializes the thread state / context of a specific thread. Assuming that the thread state class uses the CPU-specific thread state uses the base thread state serialization code, this allows us to restore a checkpoint with any of the CPU models.
2013-01-07	cpu: Make sure that a drained atomic CPU isn't executing ucode	Andreas Sandberg
	Currently, the atomic CPU can be in the middle of a microcode sequence when it is drained. This leads to two problems: * When switching to a hardware virtualized CPU, we obviously can't execute gem5 microcode. * Since curMacroStaticInst is populated when executing microcode, repeated switching between CPUs executing microcode leads to incorrect execution. After applying this patch, the CPU will be on a proper instruction boundary, which means that it is safe to switch to any CPU model (including hardware virtualized ones). This changeset fixes a bug where the multiple switches to the same atomic CPU sometimes corrupts the target state because of dangling pointers to the currently executing microinstruction. Note: This changeset moves tick event descheduling from switchOut() to drain(), which makes timing consistent between just draining a system and draining /and/ switching between two atomic CPUs. This makes debugging quite a lot easier (execution traces get the same timing), but the latency of the last instruction before a drain will not be accounted for correctly (it will always be 1 cycle). Note 2: This changeset removes so_state variable, the locked variable, and the tickEvent from checkpoints since none of them contain state that needs to be preserved across checkpoints. The so_state is made redundant because we don't use the drain state variable anymore, the lock variable should never be set when the system is drained, and the tick event isn't scheduled.
2013-01-07	cpu: Make sure that a drained timing CPU isn't executing ucode	Andreas Sandberg
	Currently, the timing CPU can be in the middle of a microcode sequence or multicycle (stayAtPC is true) instruction when it is drained. This leads to two problems: * When switching to a hardware virtualized CPU, we obviously can't execute gem5 microcode. * If stayAtPC is true we might execute half of an instruction twice when restoring a checkpoint or switching CPUs, which leads to an incorrect execution. After applying this patch, the CPU will be on a proper instruction boundary, which means that it is safe to switch to any CPU model (including hardware virtualized ones). This changeset also fixes a bug where the timing CPU sometimes switches out with while stayAtPC is true, which corrupts the target state after a CPU switch or checkpoint. Note: This changeset removes the so_state variable from checkpoints since the drain state isn't used anymore.
2013-01-07	cpu: Rename defer_registration->switched_out	Andreas Sandberg
	The defer_registration parameter is used to prevent a CPU from initializing at startup, leaving it in the "switched out" mode. The name of this parameter (and the help string) is confusing. This patch renames it to switched_out, which should be more descriptive.
2013-01-07	cpu: Correctly call parent on switchOut() and takeOverFrom()	Andreas Sandberg
	This patch cleans up the CPU switching functionality by making sure that CPU models consistently call the parent on switchOut() and takeOverFrom(). This has the following implications that might alter current functionality: * The call to BaseCPU::switchout() in the O3 CPU is moved from signalDrained() (!) to switchOut(). * A call to BaseSimpleCPU::switchOut() is introduced in the simple CPUs.
2013-01-07	cpu: Check that the memory system is in the correct mode	Andreas Sandberg
	This patch adds checks to all CPU models to make sure that the memory system is in the correct mode at startup and when resuming after a drain. Previously, we only checked that the memory system was in the right mode when resuming. This is inadequate since this is a configuration error that should be detected at startup as well as when resuming. Additionally, since the check was done using an assert, it wasn't performed when NDEBUG was set (e.g., the fast target).
2013-01-07	arch: Make the ISA class inherit from SimObject	Andreas Sandberg
	The ISA class on stores the contents of ID registers on many architectures. In order to make reset values of such registers configurable, we make the class inherit from SimObject, which allows us to use the normal generated parameter headers. This patch introduces a Python helper method, BaseCPU.createThreads(), which creates a set of ISAs for each of the threads in an SMT system. Although it is currently only needed when creating multi-threaded CPUs, it should always be called before instantiating the system as this is an obvious place to configure ID registers identifying a thread/CPU.
2013-01-04	Decoder: Remove the thread context get/set from the decoder.	Gabe Black
	This interface is no longer used, and getting rid of it simplifies the decoders and code that sets up the decoders. The thread context had been used to read architectural state which was used to contextualize the instruction memory as it came in. That was changed so that the state is now sent to the decoders to keep locally if/when it changes. That's significantly more efficient. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2012-11-02	sim: Move the draining interface into a separate base class	Andreas Sandberg
	This patch moves the draining interface from SimObject to a separate class that can be used by any object needing draining. However, objects not visible to the Python code (i.e., objects not deriving from SimObject) still depend on their parents informing them when to drain. This patch also gets rid of the CountedDrainEvent (which isn't really an event) and replaces it with a DrainManager.
2012-11-02	sim: Include object header files in SWIG interfaces	Andreas Sandberg
	When casting objects in the generated SWIG interfaces, SWIG uses classical C-style casts ( (Foo *)bar; ). In some cases, this can degenerate into the equivalent of a reinterpret_cast (mainly if only a forward declaration of the type is available). This usually works for most compilers, but it is known to break if multiple inheritance is used anywhere in the object hierarchy. This patch introduces the cxx_header attribute to Python SimObject definitions, which should be used to specify a header to include in the SWIG interface. The header should include the declaration of the wrapped object. We currently don't enforce header the use of the header attribute, but a warning will be generated for objects that do not use it.
2012-09-25	ARM: Squash outstanding walks when instructions are squashed.	Ali Saidi

2012-09-19	AddrRange: Transition from Range<T> to AddrRange	Andreas Hansson
	This patch takes the final plunge and transitions from the templated Range class to the more specific AddrRange. In doing so it changes the obvious Range<Addr> to AddrRange, and also bumps the range_map to be AddrRangeMap. In addition to the obvious changes, including the removal of redundant includes, this patch also does some house keeping in preparing for the introduction of address interleaving support in the ranges. The Range class is also stripped of all the functionality that is never used. --HG-- rename : src/base/range.hh => src/base/addr_range.hh rename : src/base/range_map.hh => src/base/addr_range_map.hh
2012-08-28	Clock: Add a Cycles wrapper class and use where applicable	Andreas Hansson
	This patch addresses the comments and feedback on the preceding patch that reworks the clocks and now more clearly shows where cycles (relative cycle counts) are used to express time. Instead of bumping the existing patch I chose to make this a separate patch, merely to try and focus the discussion around a smaller set of changes. The two patches will be pushed together though. This changes done as part of this patch are mostly following directly from the introduction of the wrapper class, and change enough code to make things compile and run again. There are definitely more places where int/uint/Tick is still used to represent cycles, and it will take some time to chase them all down. Similarly, a lot of parameters should be changed from Param.Tick and Param.Unsigned to Param.Cycles. In addition, the use of curTick is questionable as there should not be an absolute cycle. Potential solutions can be built on top of this patch. There is a similar situation in the o3 CPU where lastRunningCycle is currently counting in Cycles, and is still an absolute time. More discussion to be had in other words. An additional change that would be appropriate in the future is to perform a similar wrapping of Tick and probably also introduce a Ticks class along with suitable operators for all these classes.
2012-08-28	Clock: Rework clocks to avoid tick-to-cycle transformations	Andreas Hansson
	This patch introduces the notion of a clock update function that aims to avoid costly divisions when turning the current tick into a cycle. Each clocked object advances a private (hidden) cycle member and a tick member and uses these to implement functions for getting the tick of the next cycle, or the tick of a cycle some time in the future. In the different modules using the clocks, changes are made to avoid counting in ticks only to later translate to cycles. There are a few oddities in how the O3 and inorder CPU count idle cycles, as seen by a few locations where a cycle is subtracted in the calculation. This is done such that the regression does not change any stats, but should be revisited in a future patch. Another, much needed, change that is not done as part of this patch is to introduce a new typedef uint64_t Cycle to be able to at least hint at the unit of the variables counting Ticks vs Cycles. This will be done as a follow-up patch. As an additional follow up, the thread context still uses ticks for the book keeping of last activate and last suspend and this should probably also be changed into cycles as well.
2012-08-22	Packet: Remove NACKs from packet and its use in endpoints	Andreas Hansson
	This patch removes the NACK frrom the packet as there is no longer any module in the system that issues them (the bridge was the only one and the previous patch removes that). The handling of NACKs was mostly avoided throughout the code base, by using e.g. panic or assert false, but in a few locations the NACKs were actually dealt with (although NACKs never occured in any of the regressions). Most notably, the DMA port will now never receive a NACK and the backoff time is thus never changed. As a consequence, the entire backoff mechanism (similar to a PCI bus) is now removed and the DMA port entirely relies on the bus performing the arbitration and issuing a retry when appropriate. This is more in line with e.g. PCIe. Surprisingly, this patch has no impact on any of the regressions. As mentioned in the patch that removes the NACK from the bridge, a follow-up patch should change the request and response buffer size for at least one regression to also verify that the system behaves as expected when the bridge fills up.
2012-08-15	O3,ARM: fix some problems with drain/switchout functionality and add Drain ↵	Anthony Gutierrez
	DPRINTFs This patch fixes some problems with the drain/switchout functionality for the O3 cpu and for the ARM ISA and adds some useful debug print statements. This is an incremental fix as there are still a few bugs/mem leaks with the switchout code. Particularly when switching from an O3CPU to a TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA I haven't encountered any more assertion failures; now the kernel will typically panic inside of simulation.
2012-07-09	Port: Align port names in C++ and Python	Andreas Hansson
	This patch is a first step to align the port names used in the Python world and the C++ world. Ultimately it serves to make the use of config.json together with output from the simulation easier, including post-processing of statistics. Most notably, the CPU, cache, and bus is addressed in this patch, and there might be other ports that should be updated accordingly. The dash name separator has also been replaced with a "." which is what is used to concatenate the names in python, and a separation is made between the master and slave port in the bus.
2012-07-09	Port: Move retry from port base class to Master/SlavePort	Andreas Hansson
	This patch is the last part of moving all protocol-related functionality out of the Port base class. All the send/recv functions are already moved, and the retry (which still governs all the timing transport functions) is the only part that remained in the base class. The only point where this currently causes a bit of inconvenience is in the bus where the retry list is global and holds Port pointers (not Master/SlavePort). This is about to change with the split into a request/response bus and will soon be removed anyway. The patch has no impact on any regressions.
2012-06-08	Timing CPU: Remove a redundant port pointer	Andreas Hansson
	This patch is trivial and merely prunes a pointer that was never set or used.
2012-06-05	cpu: Don't init simple and inorder CPUs if they are defered.	Anthony Gutierrez
	initCPU() will be called to initialize switched out CPUs for the simple and inorder CPU models. this patch prevents those CPUs from being initialized because they should get their state from the active CPU when it is switched out.
2012-05-26	CPU: Merge the predecoder and decoder.	Gabe Black
	These classes are always used together, and merging them will give the ISAs more flexibility in how they cache things and manage the process. --HG-- rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
2012-05-25	Decode: Make the Decoder class defined per ISA.	Gabe Black
	--HG-- rename : src/cpu/decode.cc => src/arch/generic/decoder.cc rename : src/cpu/decode.hh => src/arch/generic/decoder.hh
2012-05-01	MEM: Separate requests and responses for timing accesses	Andreas Hansson
	This patch moves send/recvTiming and send/recvTimingSnoop from the Port base class to the MasterPort and SlavePort, and also splits them into separate member functions for requests and responses: send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq, send/recvTimingSnoopResp. A master port sends requests and receives responses, and also receives snoop requests and sends snoop responses. A slave port has the reciprocal behaviour as it receives requests and sends responses, and sends snoop requests and receives snoop responses. For all MemObjects that have only master ports or slave ports (but not both), e.g. a CPU, or a PIO device, this patch merely adds more clarity to what kind of access is taking place. For example, a CPU port used to call sendTiming, and will now call sendTimingReq. Similarly, a response previously came back through recvTiming, which is now recvTimingResp. For the modules that have both master and slave ports, e.g. the bus, the behaviour was previously relying on branches based on pkt->isRequest(), and this is now replaced with a direct call to the apprioriate member function depending on the type of access. Please note that send/recvRetry is still shared by all the timing accessors and remains in the Port base class for now (to maintain the current bus functionality and avoid changing the statistics of all regressions). The packet queue is split into a MasterPort and SlavePort version to facilitate the use of the new timing accessors. All uses of the PacketQueue are updated accordingly. With this patch, the type of packet (request or response) is now well defined for each type of access, and asserts on pkt->isRequest() and pkt->isResponse() are now moved to the appropriate send member functions. It is also worth noting that sendTimingSnoopReq no longer returns a boolean, as the semantics do not alow snoop requests to be rejected or stalled. All these assumptions are now excplicitly part of the port interface itself.