summaryrefslogtreecommitdiff
path: root/src/cpu
AgeCommit message (Collapse)Author
2016-09-13sim: Refactor quiesce and remove FS assertsMichael LeBeane
The quiesce family of magic ops can be simplified by the inclusion of quiesceTick() and quiesce() functions on ThreadContext. This patch also gets rid of the FS guards, since suspending a CPU is also a valid operation for SE mode.
2016-08-22cpu, mem, sim: Change how KVM maps memoryDavid Hashe
Only map memories into the KVM guest address space that are marked as usable by KVM. Create BackingStoreEntry class containing flags for is_conf_reported, in_addr_map, and kvm_map.
2016-08-15cpu: Add missing override in Minor's exec contextAndreas Sandberg
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-08-15cpu: Fixed clang errors. Added 'override' keyword for virtual functions.Reiley Jeapaul
Change-Id: Ic37311443ca11ee6d95bceffea599e054e7aa110 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-08-15cpu, arch: fix the type used for the request flagsNikos Nikoleris
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-07-21cpu: Fix Minor SMT WFI/drain interaction issuesMitch Hayenga
The behavior of WFI is to cause minor to cease evaluating pipeline logic until an interrupt is observed, however a user may wish to drain the system while a core is sleeping due to a WFI. This patch makes WFI drain. If an actual drain occurs during a WFI, the CPU is already drained and will immediately be ready for swapping, checkpointing, etc. This should not negatively impact performance as WFI instructions are 'stream-changing' (treated like unpredicted branches), so all remaining instructions are wrong-path and will be squashed rapidly. Change-Id: I63833d5acb53d8dde78f9f0c9611de0ece385e45
2016-07-21cpu: Add SMT support to MinorCPUMitch Hayenga
This patch adds SMT support to the MinorCPU. Currently RoundRobin or Random thread scheduling are supported. Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3
2016-06-20mem: Resolve TrafficGen trace relative to the configAndreas Sandberg
The traffic generator currently resolves relative trace paths relative to gem5's current working directory. This can lead to surprising results for relative paths where the expectation would normally be that they are resolved relative to the configuration file. This changeset implements config-relative trace file lookups. The old behavior is kept as a fallback for configs that expect that behavior. Change-Id: I1bda4e16725842666ffc37dcb6838c23a6ff138c Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
2016-06-06pwr: Low-power idle power state for idle CPUsDavid Guillen Fandos
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle. Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b
2016-06-06stats: Fixing regStats function for some SimObjectsDavid Guillen Fandos
Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!). Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-06-06sim: Call regStats of base-class as wellStephan Diestelhorst
We want to extend the stats of objects hierarchically and thus it is necessary to register the statistics of the base-class(es), as well. For now, these are empty, but generic stats will be added there. Patch originally provided by Akash Bagdia at ARM Ltd.
2016-05-27cpu: fix lastStopped unserialisationIlias Vougioukas
MinorCPU fix for corrupt numCycles when resuming from a previous simulation. --- src/cpu/minor/cpu.cc | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
2016-05-26cpu: Add a basic progress check to the TrafficGenAndreas Hansson
This patch adds a progress check to the TrafficGen so that it is easier to detect deadlock scenarios where the generator gets stuck waiting for a retry, and makes no further progress. Change-Id: Ifb8779ad0939f52c0518d0e867bac73f99b82e2b Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-04-07mem: Remove threadId from memory request classMitch Hayenga
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu. This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit.
2016-04-05cpu: Implement per-thread GHRsMitch Hayenga
Branch predictors that use GHRs should index them on a per-thread basis. This makes that so. This is a re-spin of fb51231 after the revert (bd1c6789).
2016-04-05cpu: Add an indirect branch target predictorMitch Hayenga
This patch adds a configurable indirect branch predictor that can be indexed by a combination of GHR and path history hashes. Implements the functionality described in: "Target prediction for indirect jumps" by Chang, Hao, and Patt http://dl.acm.org/citation.cfm?id=264209 This is a re-spin of fb9d142 after the revert (bd1c6789).
2016-04-05cpu: Fix BTB threading oversightMitch Hayenga
The extant BTB code doesn't hash on the thread id but does check the thread id for 'btb hits'. This results in 1-thread of a multi-threaded workload taking a BTB entry, and all other threads missing for the same branch missing.
2016-04-07Revert to 74c1e6513bd0 (sim: Thermal support for Linux)Andreas Sandberg
2016-04-06Revert power patch sets with unexpected interactionsAndreas Sandberg
The following patches had unexpected interactions with the current upstream code and have been reverted for now: e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> --HG-- extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
2016-04-05mem: Remove threadId from memory request classMitch Hayenga
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
2016-04-05cpu: Implement per-thread GHRsCurtis Dunham
Branch predictors that use GHRs should index them on a per-thread basis. This makes that so.
2016-04-05cpu: Add an indirect branch target predictorMitch Hayenga
This patch adds a configurable indirect branch predictor that can be indexed by a combination of GHR and path history hashes. Implements the functionality described in: "Target prediction for indirect jumps" by Chang, Hao, and Patt http://dl.acm.org/citation.cfm?id=264209
2016-04-05cpu: Fix BTB threading oversightMitch Hayenga
The extant BTB code doesn't hash on the thread id but does check the thread id for 'btb hits'. This results in 1-thread of a multi-threaded workload taking a BTB entry, and all other threads missing for the same branch missing.
2014-12-09power: Low-power idle power state for idle CPUsAkash Bagdia
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle.
2014-11-18power: Add power states to ClockedObjectAkash Bagdia
Add 4 power states to the ClockedObject, provides necessary access functions to check and update the power state. Default power state is UNDEFINED, it is responsibility of the respective simulation model to provide the startup state and any other logic for state change. Add number of transition stat. Add distribution of time spent in clock gated state. Add power state residency stat. Add dump call back function to allow stats update of distribution and residency stats.
2016-04-05cpu: Add instruction opclass histogram to minorMitch Hayenga
2016-04-05cpu: Query CPU for inst executed from PythonGeoffrey Blake
This patch adds the ability for the simulator to query the number of instructions a CPU has executed so far per hw-thread. This can be used to enable more flexible periodic events such as taking checkpoints starting 1s into simulation and X instructions thereafter.
2016-03-30kvm: Add an option to force context sync on kvm entry/exitAndreas Sandberg
This changeset adds an option to force the kvm-based CPUs to always synchronize the gem5 thread context representation on entry/exit into the kernel. This is very useful for debugging. Unfortunately, it is also the only way to get reliable register contents when using remote gdb functionality. The long-term solution for the latter would be to implement a kvm-specific thread context. Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com> Reviewed-by: Alexandru Dutu <alexandru.dutu@amd.com>
2016-03-20cpu: warn if TrafficGen is suppressing a large numer of packetsAndreas Hansson
Add a basic warning for every 10000 packet that is suppressed to alert the user.
2015-05-05cpu: Change literal integer constants to meaningful labelsRekai Gonzalez Alberquilla
fu_pool and inst_queue were using -1 for "no such FU" and -2 for "all those FUs are busy at the moment" when requesting for a FU and replying. This patch introduces new constants NoCapableFU and NoFreeFU respectively. In addition, the condition (idx == -2 || idx != -1) is equivalent to (idx != -1), so this patch also simplifies that. --HG-- extra : rebase_source : 4833717b9d1e09d7594d1f34f882e13fc4b86846
2015-11-27kvm: Shutdown KVM and disconnect performance counters on forkAndreas Sandberg
We can't/shouldn't use KVM after a fork since the child and parent probably point to the same VM. Knowing the exact effects of this is hard, but they are likely to be messy. We also disconnect the performance counters attached to the guest. This works around what seems to be a kernel bug where spurious SIGIOs get delivered to the forked child process. Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> [andreas.sandberg@arm.com: Fatal if entering KVM in child process ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2015-11-27base: Add support for changing output directoriesAndreas Sandberg
This changeset adds support for changing the simulator output directory. This can be useful when the simulation goes through several stages (e.g., a warming phase, a simulation phase, and a verification phase) since it allows the output from each stage to be located in a different directory. Relocation is done by calling core.setOutputDir() from Python or simout.setOutputDirectory() from C++. This change affects several parts of the design of the gem5's output subsystem. First, files returned by an OutputDirectory instance (e.g., simout) are of the type OutputStream instead of a std::ostream. This allows us to do some more book keeping and control re-opening of files when the output directory is changed. Second, new subdirectories are OutputDirectory instances, which should be used to create files in that sub-directory. Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2015-08-10mem, cpu: Add assertions to snoop invalidation logicStephan Diestelhorst
This patch adds assertions that enforce that only invalidating snoops will ever reach into the logic that tracks in-order load completion and also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds some comments to MSHR::replaceUpgrades().
2015-07-19cpu: Fix LLSC atomic CPU wakeupKrishnendra Nathella
Writes to locked memory addresses (LLSC) did not wake up the locking CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU, recvAtomicSnoop was checking if the incoming packet was an invalidation (isInvalidate) and only then handled a locked snoop. But, writes are seen instead of invalidates when running without caches (fast-forward configurations). As as simple fix, now handleLockedSnoop is also called even if the incoming snoop packet are from writes.
2016-02-24cpu: TraceGen fix for tick frequency checkMatteo Andreozzi
Bug fix for check on protobuf file frequency being different than global frequency. The ASCII encoder script is also fixed, and the example trace used in the regressions is updated.
2016-02-23scons: Add missing override to appease clangAndreas Hansson
Make clang happy...again.
2016-02-15misc: Add missing overrides to appease clangAndreas Hansson
Since the last round of fixes a few new issues have snuck in. We should consider switching the regression runs to clang.
2016-02-10mem: Deduce if cache should forward snoopsAndreas Hansson
This patch changes how the cache determines if snoops should be forwarded from the memory side to the CPU side. Instead of having a parameter, the cache now looks at the port connected on the CPU side, and if it is a snooping port, then snoops are forwarded. Less error prone, and less parameters to worry about. The patch also tidies up the CPU classes to ensure that their I-side port is not snooping by removing overrides to the snoop request handler, such that snoop requests will panic via the default MasterPort implement
2016-02-06style: eliminate explicit boolean comparisonsSteve Reinhardt
Result of running 'hg m5style --skip-all --fix-control -a' to get rid of '== true' comparisons, plus trivial manual edits to get rid of '== false'/'== False' comparisons. Left a couple of explicit comparisons in where they didn't seem unreasonable: invalid boolean comparison in src/arch/mips/interrupts.cc:155 >> DPRINTF(Interrupt, "Interrupts OnCpuTimerINterrupt(tc) == true\n");<< invalid boolean comparison in src/unittest/unittest.hh:110 >> "EXPECT_FALSE(" #expr ")", (expr) == false)<<
2016-02-06style: fix missing spaces in control statementsSteve Reinhardt
Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06style: remove trailing whitespaceSteve Reinhardt
Result of running 'hg m5style --skip-all --fix-white -a'.
2016-01-17cpu. arch: add initiateMemRead() to ExecContext interfaceSteve Reinhardt
For historical reasons, the ExecContext interface had a single function, readMem(), that did two different things depending on whether the ExecContext supported atomic memory mode (i.e., AtomicSimpleCPU) or timing memory mode (all the other models). In the former case, it actually performed a memory read; in the latter case, it merely initiated a read access, and the read completion did not happen until later when a response packet arrived from the memory system. This led to some confusing things, including timing accesses being required to provide a pointer for the return data even though that pointer was only used in atomic mode. This patch splits this interface, adding a new initiateMemRead() function to the ExecContext interface to replace the timing-mode use of readMem(). For consistency and clarity, the readMemTiming() helper function in the ISA definitions is renamed to initiateMemRead() as well. For x86, where the access size is passed in explicitly, we can also get rid of the data parameter at this level. For other ISAs, where the access size is determined from the type of the data parameter, we have to keep the parameter for that purpose.
2016-01-17cpu: remove unnecessary data ptr from O3 internal read() funcsSteve Reinhardt
The read() function merely initiates a memory read operation; the data doesn't arrive until the access completes and a response packet is received from the memory system. Thus there's no need to provide a data pointer; its existence is historical. Getting this pointer out of this internal o3 interface sets the stage for similar cleanup in the ExecContext interface. Also found that we were pointlessly setting the contents at this pointer on a store forward (the useful memcpy happens just a few lines below the deleted one).
2016-01-11scons: Enable -Wextra by defaultAndreas Hansson
Make best use of the compiler, and enable -Wextra as well as -Wall. There are a few issues that had to be resolved, but they are all trivial.
2015-12-31mem: Make cache terminology easier to understandAndreas Hansson
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand.
2015-07-20ruby: more flexible ruby tester supportBrad Beckmann
This patch allows the ruby random tester to use ruby ports that may only support instr or data requests. This patch is similar to a previous changeset (8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets. This current patch implements the support in a more straight-forward way. Since retries are now tested when running the ruby random tester, this patch splits up the retry and drain check behavior so that RubyPort children, such as the GPUCoalescer, can perform those operations correctly without having to duplicate code. Finally, the patch also includes better DPRINTFs for debugging the tester.
2015-12-07cpu: Support virtual addr in elastic tracesRadhika Jagtap
This patch adds support to optionally capture the virtual address and asid for load/store instructions in the elastic traces. If they are present in the traces, Trace CPU will set those fields of the request during replay.
2015-12-07cpu: Create record type enum for elastic tracesRadhika Jagtap
This patch replaces the booleans that specified the elastic trace record type with an enum type. The source of change is the proto message for elastic trace where the enum is introduced. The struct definitions in the elastic trace probe listener as well as the Trace CPU replace the boleans with the proto message enum. The patch does not impact functionality, but traces are not compatible with previous version. This is preparation for adding new types of records in subsequent patches.
2015-12-07cpu: Add TraceCPU to playback elastic tracesRadhika Jagtap
This patch defines a TraceCPU that replays trace generated using the elastic trace probe attached to the O3 CPU model. The elastic trace is an execution trace with data dependencies and ordering dependencies annoted to it. It also replays fixed timestamp instruction fetch trace that is also generated by the elastic trace probe. The TraceCPU inherits from BaseCPU as a result of which some methods need to be defined. It has two port subclasses inherited from MasterPort for instruction and data ports. It issues the memory requests deducing the timing from the trace and without performing real execution of micro-ops. As soon as the last dependency for an instruction is complete, its computational delay, also provided in the input trace is added. The dependency-free nodes are maintained in a list, called 'ReadyList', ordered by ready time. Instructions which depend on load stall until the responses for read requests are received thus achieving elastic replay. If the dependency is not found when adding a new node, it is assumed complete. Thus, if this node is found to be completely dependency-free its issue time is calculated and it is added to the ready list immediately. This is encapsulated in the subclass ElasticDataGen. If ready nodes are issued in an unconstrained way there can be more nodes outstanding which results in divergence in timing compared to the O3CPU. Therefore, the Trace CPU also models hardware resources. A sub-class to model hardware resources is added which contains the maximum sizes of load buffer, store buffer and ROB. If resources are not available, the node is not issued. The 'depFreeQueue' structure holds nodes that are pending issue. Modeling the ROB size in the Trace CPU as a resource limitation is arguably the most important parameter of all resources. The ROB occupancy is estimated using the newly added field 'robNum'. We need to use ROB number as sequence number is at times much higher due to squashing and trace replay is focused on correct path modeling. A map called 'inFlightNodes' is added to track nodes that are not only in the readyList but also load nodes that are executed (and thus removed from readyList) but are not complete. ReadyList handles what and when to execute next node while the inFlightNodes is used for resource modelling. The oldest ROB number is updated when any node occupies the ROB or when an entry in the ROB is released. The ROB occupancy is equal to the difference in the ROB number of the newly dependency-free node and the oldest ROB number in flight. If no node dependends on a non load/store node then there is no reason to track it in the dependency graph. We filter out such nodes but count them and add a weight field to the subsequent node that we do include in the trace. The weight field is used to model ROB occupancy during replay. The depFreeQueue is chosen to be FIFO so that child nodes which are in program order get pushed into it in that order and thus issued in the in program order, like in the O3CPU. This is also why the dependents is made a sequential container, std::set to std::vector. We only check head of the depFreeQueue as nodes are issued in order and blocking on head models that better than looping the entire queue. An alternative choice would be to inspect top N pending nodes where N is the issue-width. This is left for future as the timing correlation looks good as it is. At the start of an execution event, first we attempt to issue such pending nodes by checking if appropriate resources have become available. If yes, we compute the execute tick with respect to the time then. Then we proceed to complete nodes from the readyList. When a read response is received, sometimes a dependency on it that was supposed to be released when it was issued is still not released. This occurs because the dependent gets added to the graph after the read was sent. So the check is made less strict and the dependency is marked complete on read response instead of insisting that it should have been removed on read sent. There is a check for requests spanning two cache lines as this condition triggers an assert fail in the L1 cache. If it does then truncate the size to access only until the end of that line and ignore the remainder. Strictly-ordered requests are skipped and the dependencies on such requests are handled by simply marking them complete immediately. The simulated seconds can be calculated as the difference between the final_tick stat and the tickOffset stat. A CountedExitEvent that contains a static int belonging to the Trace CPU class as a down counter is used to implement multi Trace CPU simulation exit.
2015-12-07proto, probe: Add elastic trace probe to o3 cpuRadhika Jagtap
The elastic trace is a type of probe listener and listens to probe points in multiple stages of the O3CPU. The notify method is called on a probe point typically when an instruction successfully progresses through that stage. As different listener methods mapped to the different probe points execute, relevant information about the instruction, e.g. timestamps and register accesses, are captured and stored in temporary InstExecInfo class objects. When the instruction progresses through the commit stage, the timing and the dependency information about the instruction is finalised and encapsulated in a struct called TraceInfo. TraceInfo objects are collected in a list instead of writing them out to the trace file one a time. This is required as the trace is processed in chunks to evaluate order dependencies and computational delay in case an instruction does not have any register dependencies. By this we achieve a simpler algorithm during replay because every record in the trace can be hooked onto a record in its past. The instruction dependency trace is written out as a protobuf format file. A second trace containing fetch requests at absolute timestamps is written to a separate protobuf format file. If the instruction is not executed then it is not added to the trace. The code checks if the instruction had a fault, if it predicated false and thus previous register values were restored or if it was a load/store that did not have a request (e.g. when the size of the request is zero). In all these cases the instruction is set as executed by the Execute stage and is picked up by the commit probe listener. But a request is not issued and registers are not written. So practically, skipping these should not hurt the dependency modelling. If squashing results in squashing younger instructions, it may happen that the squash probe discards the inst and removes it from the temporary store but execute stage deals with the instruction in the next cycle which results in the execute probe seeing this inst as 'new' inst. A sequence number of the last processed trace record is used to trap these cases and not add to the temporary store. The elastic instruction trace and fetch request trace can be read in and played back by the TraceCPU.