Age | Commit message (Collapse) | Author |
|
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
|
|
Although some decent error messages were getting generated inside
isa_parser.py, they weren't always getting printed because of the
screwy way we were handling exceptions. (Basically an inner
exception would get hidden by an outer exception, and the more
informative inner error message would not get printed.)
Also line numbers were messed up, since they were taken from the
lexer, which is typically a token (or more) ahead of the grammar
rule that's being matched. Using the 'lineno' attribute that
PLY associates with the grammar production is more accurate.
The new LineTracker class extends lineno to track filenames as
well as line numbers.
|
|
These are packed single-precision approximate reciprocal operations,
vector and scalar versions, respectively.
This code was basically developed by copying the code for
sqrtps and sqrtss. The mrcp micro-op was simplified relative to
msqrt since there are no double-precision versions of this operation.
|
|
fild loads an integer value into the x87 top of stack register.
fucomi/fucomip compare two x87 register values (the latter
also doing a stack pop).
These instructions are used by some versions of GNU libstdc++.
|
|
In ARM, certain variables are only updated when a necessary change is
detected. Having 2 SMT threads share a TLB resulted in these not being
updated as required. This patch adds a thread context identifer to
assist in the invalidation of these variables.
|
|
Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
|
|
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
|
|
Changes assignment of the MPIDR for multi-threaded systems only.
|
|
This register is writable according to UA2005
Tried to boot NetBSD which starts the kernel by writing to the tick_cmpr
register. Without the patch gem5 crashes with a panic. With the patch NetBSD
starts to boot normally (although sun4v support in NetBSD is not complete yet)
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Cleaning up dead code. The CLREX stores zero directly to
MISCREG_LOCKFLAG and so the request flag is no longer needed. The
corresponding functionality in the cache tags is also removed.
|
|
A more natural home for this constant.
|
|
Added explicit data sizes and an opcode type for correct execution.
|
|
This patch implements the correct behavior.
|
|
|
|
This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.
|
|
This patch updates the x86 decoder so that it can decode instructions with vex
prefix. It also updates the isa with opcodes from vex opcode maps 1, 2 and 3.
Note that none of the instructions have been implemented yet. The
implementations would be provided in due course of time.
|
|
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.
This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.
Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
|
|
Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.
This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.
While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.
A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.
|
|
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
|
|
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.
* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).
* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.
* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
|
|
All x87 misc registers are implemented in an array of 64 bit values
but in real hardware the size of some of these registers is smaller.
Previsouly all 64 bits where incorrectly set and then later read. To
ensure correctness we mask the value in setMiscRegNoEffect to write
only the valid bits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Break the dependency on dma_device.hh by forward-declaring DmaPort in
the relevant header.
|
|
There seems to have been a debug print left in when the original ARMv8
support was merged in. This printout is performed every time you
initialize a hardware thread, and it prints raw pointers, so it always
causes diffs in the regression. This patch removes the debug print.
|
|
ldrsh was typoed as hdrsh, which is a bit annoying when printing
instructions. This patch fixes it.
|
|
put O_DIRECT under ifdefs -- this fixes build for MacOSX.
Also use correct class for arm64 openFlagTable.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This changeset adds support for aarch64 in kvm. The CPU module
supports both checkpointing and online CPU model switching as long as
no devices are simulated by the host kernel. It currently has the
following limitations:
* The system register based generic timer can only be simulated by
the host kernel. Workaround: Use a memory mapped timer instead to
simulate the timer in gem5.
* Simulating devices (e.g., the generic timer) in the host kernel
requires that the host kernel also simulates the GIC.
* ID registers in the host and in gem5 must match for switching
between simulated CPUs and KVM. This is particularly important
for ID registers describing memory system capabilities (e.g.,
ASID size, physical address size).
* Switching between a virtualized CPU and a simulated CPU is
currently not supported if in-kernel device emulation is
used. This could be worked around by adding support for switching
to the gem5 (e.g., the KvmGic) side of the device models. A
simpler workaround is to avoid in-kernel device models
altogether.
|
|
This changeset adds a GIC implementation that uses the kernel's
built-in support for simulating the interrupt controller. Since there
is currently no support for state transfer between gem5 and the
kernel, the device model does not support serialization and CPU
switching (which would require switching to a gem5-simulated GIC).
|
|
This changeset moves the ARM-specific KVM CPU implementation to
arch/arm/kvm/. This change is expected to keep the source tree
somewhat cleaner as we start adding support for ARMv8 and KVM
in-kernel interrupt controller simulation.
--HG--
rename : src/cpu/kvm/ArmKvmCPU.py => src/arch/arm/kvm/ArmKvmCPU.py
rename : src/cpu/kvm/arm_cpu.cc => src/arch/arm/kvm/arm_cpu.cc
rename : src/cpu/kvm/arm_cpu.hh => src/arch/arm/kvm/arm_cpu.hh
|
|
|
|
This patch adds better caching of the sys regs for AArch64, thus
avoiding unnecessary calls to tc->readMiscReg(MISCREG_CPSR) in the
non-faulting case.
|
|
Adding a few syscalls that were previously considered unimplemented.
|
|
The ArmSystem class has a parameter to indicate whether it is
configured to use the generic timer extension or not. This parameter
doesn't affect any feature flags in the current implementation and is
therefore completely unnecessary. In fact, we usually don't set it
even if a system has a generic timer. If we ever need to check if
there is a generic timer present, we should just request a pointer and
check if it is non-null instead.
|
|
The generic timer model currently does not support virtual
counters. Virtual and physical counters both tick with the same
frequency. However, virtual timers allow a hypervisor to set an offset
that is subtracted from the counter when it is read. This enables the
hypervisor to present a time base that ticks with virtual time in the
VM (i.e., doesn't tick when the VM isn't running). Modern Linux
kernels generally assume that virtual counters exist and try to use
them by default.
|
|
This changeset cleans up the generic timer a bit and moves most of the
register juggling from the ISA code into a separate class in the same
source file as the rest of the generic timer. It also removes the
assumption that there is always 8 or fewer CPUs in the system. Instead
of having a fixed limit, we now instantiate per-core timers as they
are requested. This is all in preparation for other patches that add
support for virtual timers and a memory mapped interface.
|
|
This patch ensures all page-table walks are flagged as such.
|
|
Three minor issues are resolved:
1. Apparently gcc 5.1 does not like negation of booleans followed by
bitwise AND.
2. Somehow the compiler also gets confused and warns about
NoopMachInst being unused (removing it causes compilation errors
though). Most likely a compiler bug.
3. There seems to be a number of instances where loop unrolling causes
false positives for the array-bounds check. For now, switch to
std::array. Potentially we could disable the warning for newer gcc
versions, but switching to std::array is probably a good move in
any case.
|
|
The current ignoreWarnOnceFunc doesn't really work as expected,
since it will only generate one warning total, for whichever
"warn-once" syscall is invoked first. This patch fixes that
behavior by keeping a "warned" flag in the SyscallDesc object,
allowing suitably flagged syscalls to warn exactly once per
syscall.
|
|
Add a missing check to ensure that exceptions are generated properly.
|
|
|
|
We currently assume that all uncacheable memory accesses are strictly
ordered. Instead of always enforcing strict ordering, we now only
enforce it if the required memory type is device memory or strongly
ordered memory.
|
|
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:
* Speculation: CPU models are not allowed to issue speculative
loads.
* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.
Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
|
|
Move Alpha-specific memory request flags to an architecture-specific
header and map them to the architecture specific flag bit range.
|
|
With the recent patches addressing how we deal with uncacheable
accesses there is no longer need for the work arounds put in place to
enforce certain sections of memory to be uncacheable during boot.
|
|
This patch simplifies the overall CPU by changing the TLB caches such
that they do not forward snoops to the table walker port(s). Note that
only ARM and X86 are affected.
There is no reason for the ports to snoop as they do not actually take
any action, and from a performance point of view we are better of not
snooping more than we have to.
Should it at a later point be required to snoop for a particular TLB
design it is easy enough to add it back.
|
|
This adds support for FreeBSD/aarch64 FS and SE mode (basic set of syscalls only)
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Same exception is raised whether division with zero is performed or the
quotient is greater than the maximum value that the provided space can hold.
Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
|
|
This patch rolls back the move of the GDB_REG_BYTES constant, and
instead adds M5_VAR_USED.
|
|
This patch fixes a few small issues to ensure gem5 compiles when using
gcc 5.1.
First, the GDB_REG_BYTES in the RemoteGDB header are, rather
surprisingly, flagged as unused for both ARM and X86. Removing them,
however, causes compilation errors as they are actually used in the
source file. Moving the constant into the class definition fixes the
issue. Possibly a gcc bug.
Second, we have an unused EthPktData constructor using auto_ptr, and
the latter is deprecated. Since the code is never used it is simply
removed.
|
|
|
|
Update table with additional definitions through Linux 3.13.
|