Age | Commit message (Collapse) | Author |
|
|
|
The Minor and o3 cpu models share the branch prediction
code. Minor relies on the BPredUnit::squash() function
to update the branch predictor tables on a branch mispre-
diction. This is fine because Minor executes in-order, so
the update is on the correct path. However, this causes the
branch predictor to be updated on out-of-order branch
mispredictions when using the o3 model, which should not
be the case.
This patch guards against speculative update of the branch
prediction tables. On a branch misprediction, BPredUnit::squash()
calls BpredUnit::update(..., squashed = true). The underlying
branch predictor tests against the value of squashed. If it is
true, it restores any speculatively updated internal state
it might have (e.g., global/local branch history), then returns.
If false, it updates its prediction tables. Previously, exist-
ing predictors did not test against the "squashed" parameter.
To accomodate for this change, the Minor model must now call
BPredUnit::squash() then BPredUnit::update(..., squashed = false)
on branch mispredictions. Before, calling BpredUnit::squash()
performed the prediction tables update.
The effect is a slight MPKI improvement when using the o3
model. A further patch should perform the same modifications
for the indirect target predictor and BTB (less critical).
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: Ic37311443ca11ee6d95bceffea599e054e7aa110
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
The behavior of WFI is to cause minor to cease evaluating
pipeline logic until an interrupt is observed, however
a user may wish to drain the system while a core is sleeping
due to a WFI. This patch makes WFI drain. If an actual
drain occurs during a WFI, the CPU is already drained and will
immediately be ready for swapping, checkpointing, etc. This
should not negatively impact performance as WFI instructions
are 'stream-changing' (treated like unpredicted branches), so
all remaining instructions are wrong-path and will be squashed
rapidly.
Change-Id: I63833d5acb53d8dde78f9f0c9611de0ece385e45
|
|
This patch adds SMT support to the MinorCPU. Currently
RoundRobin or Random thread scheduling are supported.
Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3
|
|
Add functionality to the BaseCPU that will put the entire CPU
into a low-power idle state whenever all threads in it are idle.
Change-Id: I984d1656eb0a4863c87ceacd773d2d10de5cfd2b
|
|
MinorCPU fix for corrupt numCycles when resuming from a previous simulation.
---
src/cpu/minor/cpu.cc | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
|
|
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
|
|
Add functionality to the BaseCPU that will put the entire CPU into a low-power
idle state whenever all threads in it are idle.
|
|
|
|
Writes to locked memory addresses (LLSC) did not wake up the locking
CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU,
recvAtomicSnoop was checking if the incoming packet was an invalidation
(isInvalidate) and only then handled a locked snoop. But, writes are
seen instead of invalidates when running without caches (fast-forward
configurations). As as simple fix, now handleLockedSnoop is also called
even if the incoming snoop packet are from writes.
|
|
Since the last round of fixes a few new issues have snuck in. We
should consider switching the regression runs to clang.
|
|
This patch changes how the cache determines if snoops should be
forwarded from the memory side to the CPU side. Instead of having a
parameter, the cache now looks at the port connected on the CPU side,
and if it is a snooping port, then snoops are forwarded. Less error
prone, and less parameters to worry about.
The patch also tidies up the CPU classes to ensure that their I-side
port is not snooping by removing overrides to the snoop request
handler, such that snoop requests will panic via the default
MasterPort implement
|
|
Result of running 'hg m5style --skip-all --fix-control -a'.
|
|
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.
This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.
This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().
For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level. For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
|
|
This patch adds explicit overrides as this is now required when using
"-Wall" with clang >= 3.5, the latter now part of the most recent
XCode. The patch consequently removes "virtual" for those methods
where "override" is added. The latter should be enough of an
indication.
As part of this patch, a few minor issues that clang >= 3.5 complains
about are also resolved (unused methods and variables).
|
|
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
|
|
Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
|
|
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
|
|
Adds per-thread address monitors to support FullSystem SMT.
|
|
This patch simplifies the packet, and removes the possibility of
creating a packet without a valid address and/or size. Under no
circumstances are these fields set at a later point, and thus they
really have to be provided at construction time.
The patch also fixes a case there the MinorCPU creates a packet
without a valid address and size, only to later delete it.
|
|
Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
|
|
Fix a spurious %s and include the state of the Fetch1 stage in the
debug printout.
|
|
The Minor CPU currently doesn't drain properly when it is switched
out. This happens because Fetch 1 expects to be in the FetchHalted
state when it is drained. However, because the CPU is switched out, it
is stuck in the FetchWaitingForPC state. Fix this by ignoring drain
requests and returning DrainState::Drained from MinorCPU::drain() if
the CPU is switched out. This is always safe since a switched out CPU,
by definition, doesn't have any instructions in flight.
|
|
Minor currently activates thread 0 in startup() to work around an
issue where activateContext() is called from LiveProcess before the
process entry point is known. When activateContext() is called, Minor
creates a branch instruction to the process's entry point. The first
time it is called, the branch points to an undefined location (0). The
call in startup() updates the branch to point to the actual entry
point.
When instantiating a switched out Minor CPU, it still tries to
activate thread 0. This is clearly incorrect since a switched out CPU
can't have any active threads. This changeset adds a check to ensure
that the thread is active before reactivating it.
|
|
The drain refactor patches introduced a couple of bugs in the way
Minor handles draining. This patch fixes an incorrect assert and a
case of infinite recursion when the CPU signals drain done.
|
|
|
|
This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.
|
|
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.
This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.
Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
|
|
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
|
|
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.
* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).
* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.
* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
|
|
The MinorCPU would count bubbles in Execute::issue as part of
the num_insts_issued and so sometimes reach the instruction
issue limit incorrectly.
Fixed by checking for a bubble in one new place.
|
|
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:
* Speculation: CPU models are not allowed to issue speculative
loads.
* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.
Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
|
|
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
|
|
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
The totalInstructions counter is only incremented when the whole instruction is
commited and not on every microop. It was incorrectly reset in atomic and
timing cpus.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
|
|
Refactor the way that specific MemCmd values are generated for packets.
The new approach is a little more elegant in that we assign the right
value up front, and it's also more amenable to non-heap-allocated
Packet objects.
Also replaced the code in the Minor model that was still doing it the
ad-hoc way.
This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
|
|
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
|
|
Finally took the plunge and made this apply to all ISAs, not just ARM.
|
|
Track memory size and flags as well as add some comments and consts.
|
|
|
|
Minor was reporting the data cache access as ".inst" accesses.
This just switches the MasterPortID to dataMasterPortId.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch fixes a case where a store in Minor's store buffer never
leaves the store buffer as it is pre-maturely counted as having been
issued, leading to the store buffer idling.
LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses
either in memory, or still in the store buffer after being completed.
For stores which are also barriers, the store will stay in the store
buffer for a cycle after it is completed and will be cleaned up by the
barrier clearing code (to ensure that barriers are completed in-order).
To acheive this, numUnissuedAccesses is not decremented when a store-barrier
is issued to memory, but when its barrier effect is cleared.
Without this patch, the correct behaviour happens when a memory transaction
is immediately accepted, but not if it needs a retry.
|
|
This patch fixes the checking of the number of memory instructions issued
per cycles in the Minor CPU.
|
|
This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.
The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.
As the last part the patch, it renames dataDynamicArray to dataDynamic.
|