summaryrefslogtreecommitdiff
path: root/src/cpu/o3
AgeCommit message (Collapse)Author
2017-07-05arch: ISA parser additions of vector registersRekai Gonzalez-Alberquilla
Reiley's update :) of the isa parser definitions. My addition of the vector element operand concept for the ISA parser. Nathanael's modification creating a hierarchy between vector registers and its constituencies to the isa parser. Some fixes/updates on top to consider instructions as vectors instead of floating when they use the VectorRF. Some counters added to all the models to keep faithful counts. Change-Id: Id8f162a525240dfd7ba884c5a4d9fa69f4050101 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2706 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2017-07-05cpu: Added interface for vector reg fileRekai Gonzalez-Alberquilla
This patch adds some more functionality to the cpu model and the arch to interface with the vector register file. This change consists mainly of augmenting ThreadContexts and ExecContexts with calls to get/set full vectors, underlying microarchitectural elements or lanes. Those are meant to interface with the vector register file. All classes that implement this interface also get an appropriate implementation. This requires implementing the vector register file for the different models using the VecRegContainer class. This change set also updates the Result abstraction to contemplate the possibility of having a vector as result. The changes also affect how the remote_gdb connection works. There are some (nasty) side effects, such as the need to define dummy numPhysVecRegs parameter values for architectures that do not implement vector extensions. Nathanael Premillieu's work with an increasing number of fixes and improvements of mine. Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues and CC reg free list initialisation ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2705
2017-07-05cpu: Simplify the rename interface and use RegIdRekai Gonzalez-Alberquilla
With the hierarchical RegId there are a lot of functions that are redundant now. The idea behind the simplification is that instead of having the regId, telling which kind of register read/write/rename/lookup/etc. and then the function panic_if'ing if the regId is not of the appropriate type, we provide an interface that decides what kind of register to read depending on the register type of the given regId. Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2702
2017-07-05cpu: Physical register structural + flat indexingNathanael Premillieu
Mimic the changes done on the architectural register indexes on the physical register indexes. This is specific to the O3 model. The structure, called PhysRegId, contains a register class, a register index and a flat register index. The flat register index is kept because it is useful in some cases where the type of register is not important (dependency graph and scoreboard for example). Instead of directly using the structure, most of the code is working with a const PhysRegId* (typedef to PhysRegIdPtr). The actual PhysRegId objects are stored in the regFile. Change-Id: Ic879a3cc608aa2f34e2168280faac1846de77667 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2701 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2017-07-05arch, cpu: Architectural Register structural indexingNathanael Premillieu
Replace the unified register mapping with a structure associating a class and an index. It is now much easier to know which class of register the index is referring to. Also, when adding a new class there is no need to modify existing ones. Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> [ Fix RISCV build issues ] Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/2700
2017-06-20cpu, gpu-compute: Replace EventWrapper use with EventFunctionWrapperSean Wilson
Change-Id: Idd5992463bcf9154f823b82461070d1f1842cea3 Signed-off-by: Sean Wilson <spwilson2@wisc.edu> Reviewed-on: https://gem5-review.googlesource.com/3746 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
2017-05-15cpu: fix problem with forwarding and locked loadAlec Roelke
If a (regular) store is followed closely enough by a locked load that overlaps, the LSQ will forward the store's data to the locked load and never tell the cache about the locked load. As a result, the cache will not lock the address and all future store-conditional requests on that address will fail. This patch fixes that by preventing forwarding if the memory request is a locked load and adding another case to the LSQ forwarding logic that delays the locked load request if a store in the LSQ contains all or part of the data that is requested. [Merge second and last if blocks because their bodies are the same.] Change-Id: I895cc2b9570035267bdf6ae3fdc8a09049969841 Reviewed-on: https://gem5-review.googlesource.com/2400 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
2017-02-27syscall_emul: [PATCH 15/22] add clone/execve for threading and multiprocess ↵Brandon Potter
simulations Modifies the clone system call and adds execve system call. Requires allowing processes to steal thread contexts from other processes in the same system object and the ability to detach pieces of process state (such as MemState) to allow dynamic sharing.
2015-07-20syscall_emul: [patch 13/22] add system call retry capabilityBrandon Potter
This changeset adds functionality that allows system calls to retry without affecting thread context state such as the program counter or register values for the associated thread context (when system calls return with a retry fault). This functionality is needed to solve problems with blocking system calls in multi-process or multi-threaded simulations where information is passed between processes/threads. Blocking system calls can cause deadlock because the simulator itself is single threaded. There is only a single thread servicing the event queue which can cause deadlock if the thread hits a blocking system call instruction. To illustrate the problem, consider two processes using the producer/consumer sharing model. The processes can use file descriptors and the read and write calls to pass information to one another. If the consumer calls the blocking read system call before the producer has produced anything, the call will block the event queue (while executing the system call instruction) and deadlock the simulation. The solution implemented in this changeset is to recognize that the system calls will block and then generate a special retry fault. The fault will be sent back up through the function call chain until it is exposed to the cpu model's pipeline where the fault becomes visible. The fault will trigger the cpu model to replay the instruction at a future tick where the call has a chance to succeed without actually going into a blocking state. In subsequent patches, we recognize that a syscall will block by calling a non-blocking poll (from inside the system call implementation) and checking for events. When events show up during the poll, it signifies that the call would not have blocked and the syscall is allowed to proceed (calling an underlying host system call if necessary). If no events are returned from the poll, we generate the fault and try the instruction for the thread context at a distant tick. Note that retrying every tick is not efficient. As an aside, the simulator has some multi-threading support for the event queue, but it is not used by default and needs work. Even if the event queue was completely multi-threaded, meaning that there is a hardware thread on the host servicing a single simulator thread contexts with a 1:1 mapping between them, it's still possible to run into deadlock due to the event queue barriers on quantum boundaries. The solution of replaying at a later tick is the simplest solution and solves the problem generally.
2016-11-09style: [patch 1/22] use /r/3648/ to reorganize includesBrandon Potter
2016-12-21cpu: Resolve targets of predicted 'taken' decode for O3Arthur Perais
The target of taken conditional direct branches does not need to be resolved in IEW: the target can be computed at decode, usually using the decoded instruction word and the PC. The higher-than-necessary penalty is taken only on conditional branches that are predicted taken but miss in the BTB. Thus, this is mostly inconsequential on IPC if the BTB is big/associative enough (fewer capacity/conflict misses). Nonetheless, what gem5 simulates is not representative of how conditional branch targets can be handled. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-12-21cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3Arthur Perais
cachePorts currently constrains the number of store packets written to the D-Cache each cycle), but loads currently affect this variable. This leads to unexpected congestion (e.g., setting cachePorts to a realistic 1 will in fact allow a store to WB only if no loads have accessed the D-Cache this cycle). In the absence of arbitration, this patch decouples how many loads can be done per cycle from how many stores can be done per cycle. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15cpu, arm: Distinguish Float* and SimdFloat*, create FloatMem* opClassFernando Endo
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which distinguishes writes to the INT and FP register banks. Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72, where the "latency" of FMADD is 3 if the next instruction is a FMADD and has only the augend to destination dependency, otherwise it's 7 cycles. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-09-22cpu: Fix the O3 CPU DrainRekai Gonzalez-Alberquilla
The drain did not wait until stages were ready again. Therefore, as a result of messages in the TimeBuffer being drain, the state after the drain was not consistent and asserts fired in some places when the draining happened after a stage got blocked, but before the notification arrived to the previous stages. Change-Id: Ib50b3b40b7f745b62c1eba2931dec76860824c71 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-09-13sim: Refactor quiesce and remove FS assertsMichael LeBeane
The quiesce family of magic ops can be simplified by the inclusion of quiesceTick() and quiesce() functions on ThreadContext. This patch also gets rid of the FS guards, since suspending a CPU is also a valid operation for SE mode.
2016-06-06pwr: Low-power idle power state for idle CPUsDavid Guillen Fandos
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle. Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b
2016-06-06stats: Fixing regStats function for some SimObjectsDavid Guillen Fandos
Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!). Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-04-07mem: Remove threadId from memory request classMitch Hayenga
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu. This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit.
2016-04-06Revert power patch sets with unexpected interactionsAndreas Sandberg
The following patches had unexpected interactions with the current upstream code and have been reverted for now: e07fd01651f3: power: Add support for power models 831c7f2f9e39: power: Low-power idle power state for idle CPUs 4f749e00b667: power: Add power states to ClockedObject Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> --HG-- extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
2016-04-05mem: Remove threadId from memory request classMitch Hayenga
In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu.
2014-12-09power: Low-power idle power state for idle CPUsAkash Bagdia
Add functionality to the BaseCPU that will put the entire CPU into a low-power idle state whenever all threads in it are idle.
2015-05-05cpu: Change literal integer constants to meaningful labelsRekai Gonzalez Alberquilla
fu_pool and inst_queue were using -1 for "no such FU" and -2 for "all those FUs are busy at the moment" when requesting for a FU and replying. This patch introduces new constants NoCapableFU and NoFreeFU respectively. In addition, the condition (idx == -2 || idx != -1) is equivalent to (idx != -1), so this patch also simplifies that. --HG-- extra : rebase_source : 4833717b9d1e09d7594d1f34f882e13fc4b86846
2015-11-27base: Add support for changing output directoriesAndreas Sandberg
This changeset adds support for changing the simulator output directory. This can be useful when the simulation goes through several stages (e.g., a warming phase, a simulation phase, and a verification phase) since it allows the output from each stage to be located in a different directory. Relocation is done by calling core.setOutputDir() from Python or simout.setOutputDirectory() from C++. This change affects several parts of the design of the gem5's output subsystem. First, files returned by an OutputDirectory instance (e.g., simout) are of the type OutputStream instead of a std::ostream. This allows us to do some more book keeping and control re-opening of files when the output directory is changed. Second, new subdirectories are OutputDirectory instances, which should be used to create files in that sub-directory. Signed-off-by: Andreas Sandberg <andreas@sandberg.pp.se> [sascha.bischoff@arm.com: Rebased patches onto a newer gem5 version] Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2015-08-10mem, cpu: Add assertions to snoop invalidation logicStephan Diestelhorst
This patch adds assertions that enforce that only invalidating snoops will ever reach into the logic that tracks in-order load completion and also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds some comments to MSHR::replaceUpgrades().
2015-07-19cpu: Fix LLSC atomic CPU wakeupKrishnendra Nathella
Writes to locked memory addresses (LLSC) did not wake up the locking CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU, recvAtomicSnoop was checking if the incoming packet was an invalidation (isInvalidate) and only then handled a locked snoop. But, writes are seen instead of invalidates when running without caches (fast-forward configurations). As as simple fix, now handleLockedSnoop is also called even if the incoming snoop packet are from writes.
2016-02-10mem: Deduce if cache should forward snoopsAndreas Hansson
This patch changes how the cache determines if snoops should be forwarded from the memory side to the CPU side. Instead of having a parameter, the cache now looks at the port connected on the CPU side, and if it is a snooping port, then snoops are forwarded. Less error prone, and less parameters to worry about. The patch also tidies up the CPU classes to ensure that their I-side port is not snooping by removing overrides to the snoop request handler, such that snoop requests will panic via the default MasterPort implement
2016-02-06style: fix missing spaces in control statementsSteve Reinhardt
Result of running 'hg m5style --skip-all --fix-control -a'.
2016-02-06style: remove trailing whitespaceSteve Reinhardt
Result of running 'hg m5style --skip-all --fix-white -a'.
2016-01-17cpu: remove unnecessary data ptr from O3 internal read() funcsSteve Reinhardt
The read() function merely initiates a memory read operation; the data doesn't arrive until the access completes and a response packet is received from the memory system. Thus there's no need to provide a data pointer; its existence is historical. Getting this pointer out of this internal o3 interface sets the stage for similar cleanup in the ExecContext interface. Also found that we were pointlessly setting the contents at this pointer on a store forward (the useful memcpy happens just a few lines below the deleted one).
2015-12-31mem: Make cache terminology easier to understandAndreas Hansson
This patch changes the name of a bunch of packet flags and MSHR member functions and variables to make the coherency protocol easier to understand. In addition the patch adds and updates lots of descriptions, explicitly spelling out assumptions. The following name changes are made: * the packet memInhibit flag is renamed to cacheResponding * the packet sharedAsserted flag is renamed to hasSharers * the packet NeedsExclusive attribute is renamed to NeedsWritable * the packet isSupplyExclusive is renamed responderHadWritable * the MSHR pendingDirty is renamed to pendingModified The cache states, Modified, Owned, Exclusive, Shared are also called out in the cache and MSHR code to make it easier to understand.
2015-12-07cpu: Support virtual addr in elastic tracesRadhika Jagtap
This patch adds support to optionally capture the virtual address and asid for load/store instructions in the elastic traces. If they are present in the traces, Trace CPU will set those fields of the request during replay.
2015-12-07cpu: Create record type enum for elastic tracesRadhika Jagtap
This patch replaces the booleans that specified the elastic trace record type with an enum type. The source of change is the proto message for elastic trace where the enum is introduced. The struct definitions in the elastic trace probe listener as well as the Trace CPU replace the boleans with the proto message enum. The patch does not impact functionality, but traces are not compatible with previous version. This is preparation for adding new types of records in subsequent patches.
2015-12-07proto, probe: Add elastic trace probe to o3 cpuRadhika Jagtap
The elastic trace is a type of probe listener and listens to probe points in multiple stages of the O3CPU. The notify method is called on a probe point typically when an instruction successfully progresses through that stage. As different listener methods mapped to the different probe points execute, relevant information about the instruction, e.g. timestamps and register accesses, are captured and stored in temporary InstExecInfo class objects. When the instruction progresses through the commit stage, the timing and the dependency information about the instruction is finalised and encapsulated in a struct called TraceInfo. TraceInfo objects are collected in a list instead of writing them out to the trace file one a time. This is required as the trace is processed in chunks to evaluate order dependencies and computational delay in case an instruction does not have any register dependencies. By this we achieve a simpler algorithm during replay because every record in the trace can be hooked onto a record in its past. The instruction dependency trace is written out as a protobuf format file. A second trace containing fetch requests at absolute timestamps is written to a separate protobuf format file. If the instruction is not executed then it is not added to the trace. The code checks if the instruction had a fault, if it predicated false and thus previous register values were restored or if it was a load/store that did not have a request (e.g. when the size of the request is zero). In all these cases the instruction is set as executed by the Execute stage and is picked up by the commit probe listener. But a request is not issued and registers are not written. So practically, skipping these should not hurt the dependency modelling. If squashing results in squashing younger instructions, it may happen that the squash probe discards the inst and removes it from the temporary store but execute stage deals with the instruction in the next cycle which results in the execute probe seeing this inst as 'new' inst. A sequence number of the last processed trace record is used to trap these cases and not add to the temporary store. The elastic instruction trace and fetch request trace can be read in and played back by the TraceCPU.
2015-12-07probe: Add probe in Fetch, IEW, Rename and CommitRadhika Jagtap
This patch adds probe points in Fetch, IEW, Rename and Commit stages as follows. A probe point is added in the Fetch stage for probing when a fetch request is sent. Notify is fired on the probe point when a request is sent succesfully in the first attempt as well as on a retry attempt. Probe points are added in the IEW stage when an instruction begins to execute and when execution is complete. This points can be used for monitoring the execution time of an instruction. Probe points are added in the Rename stage to probe renaming of source and destination registers and when there is squashing. These probe points can be used to track register dependencies and remove when there is squashing. A probe point for squashing is added in Commit to probe squashed instructions.
2015-12-04cpu: fix unitialized variable which may cause assertion failurePau Cabre
The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to NULL (I got the assertion failure several times without the fix). Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-22cpu: Fix base FP and CC register index in o3 insertThread()Nathanael Premillieu
Note that the method is not used, and could possibly be deleted.
2015-11-16o3: drop unused statistic wbPenalized and wbPenalizedRateNilay Vaish
2015-10-12misc: Add explicit overrides and fix other clang >= 3.5 issuesAndreas Hansson
This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables).
2015-10-12misc: Remove redundant compiler-specific definesAndreas Hansson
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions.
2015-10-09isa: Add parameter to pick different decoder inside ISARekai Gonzalez Alberquilla
The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time.
2015-09-30cpu,isa,mem: Add per-thread wakeup logicMitch Hayenga
Changes wakeup functionality so that only specific threads on SMT capable cpus are woken.
2015-09-30isa,cpu: Add support for FS SMT InterruptsMitch Hayenga
Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems.
2015-09-30cpu: Add per-thread monitorsMitch Hayenga
Adds per-thread address monitors to support FullSystem SMT.
2015-09-15cpu, o3: consider split requests for LSQ checksnoop operationsHongil Yoon
This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions. The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-08-07base: Declare a type for context IDsAndreas Sandberg
Context IDs used to be declared as ad hoc (usually as int). This changeset introduces a typedef for ContextIDs and a constant for invalid context IDs.
2015-07-20cpu: Fixed a bug on where to fetch the next instruction fromDavid Hashe
Figure out if the next instruction to fetch comes from the micro-op ROM or not. Otherwise, wrong instructions may be fetched.
2015-07-28revert 5af8f40d8f2cNilay Vaish
2015-07-26cpu: implements vector registersNilay Vaish
This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.
2015-07-26cpu: o3: slight correction to identation in rename_impl.hhNilay Vaish
2015-07-07sim: Refactor and simplify the drain APIAndreas Sandberg
The drain() call currently passes around a DrainManager pointer, which is now completely pointless since there is only ever one global DrainManager in the system. It also contains vestiges from the time when SimObjects had to keep track of their child objects that needed draining. This changeset moves all of the DrainState handling to the Drainable base class and changes the drain() and drainResume() calls to reflect this. Particularly, the drain() call has been updated to take no parameters (the DrainManager argument isn't needed) and return a DrainState instead of an unsigned integer (there is no point returning anything other than 0 or 1 any more). Drainable objects should return either DrainState::Draining (equivalent to returning 1 in the old system) if they need more time to drain or DrainState::Drained (equivalent to returning 0 in the old system) if they are already in a consistent state. Returning DrainState::Running is considered an error. Drain done signalling is now done through the signalDrainDone() method in the Drainable class instead of using the DrainManager directly. The new call checks if the state of the object is DrainState::Draining before notifying the drain manager. This means that it is safe to call signalDrainDone() without first checking if the simulator has requested draining. The intention here is to reduce the code needed to implement draining in simple objects.