Age | Commit message (Collapse) | Author |
|
This flag means that the instruction isn't an actual instruction, it's
just a placeholder to carry a fault down a pipeline, for instance.
Change-Id: I1cc12b068662dbd3d3b089c9941a07b6e88b57e3
Reviewed-on: https://gem5-review.googlesource.com/7123
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
Get rid of some remnants of a system which was intended to separate
address computation into its own instruction object.
Change-Id: I23f9ffd70fcb89a8ea5bbb934507fb00da9a0b7f
Reviewed-on: https://gem5-review.googlesource.com/7122
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
Reiley's update :) of the isa parser definitions. My addition of the
vector element operand concept for the ISA parser. Nathanael's modification
creating a hierarchy between vector registers and its constituencies to the
isa parser.
Some fixes/updates on top to consider instructions as vectors instead of
floating when they use the VectorRF. Some counters added to all the
models to keep faithful counts.
Change-Id: Id8f162a525240dfd7ba884c5a4d9fa69f4050101
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2706
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
This patch adds some more functionality to the cpu model and the arch to
interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts
with calls to get/set full vectors, underlying microarchitectural elements
or lanes. Those are meant to interface with the vector register file. All
classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different
models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the
possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy
numPhysVecRegs parameter values for architectures that do not implement
vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and
improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues and CC reg free list initialisation ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2705
|
|
The Result union used to collect the result of an instruction is now a
class of its own, with its constructor, and explicit casting methods for
cleanliness.
This is also a stepping stone to have vector registers, and instructions
that produce a vector register as output.
Change-Id: I6f40c11cb5e835d8b11f7804a4e967aff18025b9
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2703
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
With the hierarchical RegId there are a lot of functions that are
redundant now.
The idea behind the simplification is that instead of having the regId,
telling which kind of register read/write/rename/lookup/etc. and then
the function panic_if'ing if the regId is not of the appropriate type,
we provide an interface that decides what kind of register to read
depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2702
|
|
Mimic the changes done on the architectural register indexes on the
physical register indexes. This is specific to the O3 model. The
structure, called PhysRegId, contains a register class, a register
index and a flat register index. The flat register index is kept
because it is useful in some cases where the type of register is not
important (dependency graph and scoreboard for example). Instead
of directly using the structure, most of the code is working with
a const PhysRegId* (typedef to PhysRegIdPtr). The actual PhysRegId
objects are stored in the regFile.
Change-Id: Ic879a3cc608aa2f34e2168280faac1846de77667
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2701
Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Replace the unified register mapping with a structure associating
a class and an index. It is now much easier to know which class of
register the index is referring to. Also, when adding a new class
there is no need to modify existing ones.
Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2700
|
|
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
|
|
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
|
|
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.
This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.
This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().
For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level. For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
|
|
The read() function merely initiates a memory read operation; the
data doesn't arrive until the access completes and a response packet
is received from the memory system. Thus there's no need to provide
a data pointer; its existence is historical.
Getting this pointer out of this internal o3 interface sets the
stage for similar cleanup in the ExecContext interface. Also
found that we were pointlessly setting the contents at this pointer
on a store forward (the useful memcpy happens just a few lines
below the deleted one).
|
|
Make best use of the compiler, and enable -Wextra as well as
-Wall. There are a few issues that had to be resolved, but they are
all trivial.
|
|
Adds per-thread address monitors to support FullSystem SMT.
|
|
This patch enables instructions in LSQ to track two physical addresses for
corresponding two split requests. Later, the information is used in
checksnoop() to search for/invalidate the corresponding LD instructions.
The current implementation has kept track of only the physical address that is
referenced by the first split request. Thus, for checksnoop(), the line
accessed by the second request has not been considered, causing potential
correctness issues.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Context IDs used to be declared as ad hoc (usually as int). This
changeset introduces a typedef for ContextIDs and a constant for
invalid context IDs.
|
|
|
|
This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.
|
|
Three minor issues are resolved:
1. Apparently gcc 5.1 does not like negation of booleans followed by
bitwise AND.
2. Somehow the compiler also gets confused and warns about
NoopMachInst being unused (removing it causes compilation errors
though). Most likely a compiler bug.
3. There seems to be a number of instances where loop unrolling causes
false positives for the array-bounds check. For now, switch to
std::array. Potentially we could disable the warning for newer gcc
versions, but switching to std::array is probably a good move in
any case.
|
|
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:
* Speculation: CPU models are not allowed to issue speculative
loads.
* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.
Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
|
|
Now, prior to the renaming, the instruction requests the exact amount of
registers it will need, and the rename_map decides whether the instruction is
allowed to proceed or not.
|
|
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.
--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
|
|
Track memory size and flags as well as add some comments and consts.
|
|
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
evicted from the sleeping cpu's cache. This eviction is forwarded to the
sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
|
|
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
|
|
We currently generate and compile one version of the ISA code per CPU
model. This is obviously wasting a lot of resources at compile
time. This changeset factors out the interface into a separate
ExecContext class, which also serves as documentation for the
interface between CPUs and the ISA code. While doing so, this
changeset also fixes up interface inconsistencies between the
different CPU models.
The main argument for using one set of ISA code per CPU model has
always been performance as this avoid indirect branches in the
generated code. However, this argument does not hold water. Booting
Linux on a simulated ARM system running in atomic mode
(opt/10.linux-boot/realview-simple-atomic) is actually 2% faster
(compiled using clang 3.4) after applying this patch. Additionally,
compilation time is decreased by 35%.
|
|
Allow the specification of a socket ID for every core that is reflected in the
MPIDR field in ARM systems. This allows studying multi-socket / cluster
systems with ARM CPUs.
|
|
This patch merely tidies up the CPU and ThreadContext getters by
making them const where appropriate.
|
|
snooped.
This patch add support for generating wake-up events in the CPU when an address
that is currently in the exclusive state is hit by a snoop. This mechanism is required
for ARMv8 multi-processor support.
|
|
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks. Cache occupancy stats are
recalculated on each stat dump.
|
|
This change fixes an issue in the O3 CPU where an uncachable instruction
is attempted to be executed before it reaches the head of the ROB. It is
determined to be uncacheable, and is replayed, but a PanicFault is attached
to the instruction to make sure that it is properly executed before
committing. If the TLB entry it was using is replaced in the interveaning
time, the TLB returns a delayed translation when the load is replayed at
the head of the ROB, however the LSQ code can't differntiate between the
old fault and the new one. If the translation isn't complete it can't
be faulting, so clear the fault.
|
|
Add a third register class for condition codes,
in parallel with the integer and FP classes.
No ISAs use the CC class at this point though.
|
|
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.
Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.
A follow-on patch updates the configuration scripts accordingly.
|
|
DynInst is extremely large the hope is that this re-organization will put the
most used members close to each other.
|
|
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
|
|
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
|
|
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files. Also merges with the SE/FS changes.
|
|
This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.
The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
|
|
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
|
|
Because there are no longer architecture independent but specialized functions
in arch/XXX/faults.hh, code that isn't using the faults from a particular ISA
no longer needs to be able to include them through the switching header file
arch/faults.hh. By removing that header file (arch/faults.hh), the potential
interface between ISA code and non ISA code is narrowed.
|
|
|
|
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU. These changes have only been tested with the ARM ISA. Other
ISAs potentially require modification.
|
|
|
|
Only create a memory ordering violation when the value could have changed
between two subsequent loads, instead of just when loads go out-of-order
to the same address. While not very common in the case of Alpha, with
an architecture with a hardware table walker this can happen reasonably
frequently beacuse a translation will miss and start a table walk and
before the CPU re-schedules the faulting instruction another one will
pass it to the same address (or cache block depending on the dendency
checking).
This patch has been tested with a couple of self-checking hand crafted
programs to stress ordering between two cores.
The performance improvement on SPEC benchmarks can be substantial (2-10%).
|
|
Having two StaticInst classes, one nominally ISA dependent and the other ISA
dependent, has not been historically useful and makes the StaticInst class
more complicated that it needs to be. This change merges StaticInstBase into
StaticInst.
|
|
|
|
This allows regular pointers and reference counted pointers without having to
use any shim structures or other tricks.
|