Age | Commit message (Collapse) | Author |
|
This constant is, first, a #define, and second only used in one place.
In that one place, it appears that the code it guards is no longer
necessary in general. It was originally written to avoid refetching a
block of data that you're still in, even if you've moved slightly
farther in it because you're skipping the next instruction due to an
annulled branch delay slot. In reality however, in SPARC, the one ISA
I'm aware of which has this sort of branching behavior, the PC state
object will correctly determine that no branch is happening in these
cases. Code lower down in the loop will then recompute where fetching
should continue based on the next PC, automatically skipping the
annulled branch slot without misinterpretting the gap as a branch.
This change therefore also removes this block of code.
Change-Id: I820ebc9df10aeb4fcb69c12f6a784e9ec616743c
Reviewed-on: https://gem5-review.googlesource.com/6821
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
It's no longer used.
Change-Id: I4a71bcb214f1bb186b92ef50841eca635e6701c5
Reviewed-on: https://gem5-review.googlesource.com/6826
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
CPUs have historically instantiated the architecture specific version
of the TLBs to avoid a virtual function call, making them a little bit
more dependent on what the current ISA is. Some simple performance
measurement, the x86 twolf regression on the atomic CPU, shows that
there isn't actually any performance benefit, and if anything the
simulator goes slightly faster (although still within margin of error)
when the TLB functions are virtual.
This change switches everything outside of the architectures themselves
to use the generic BaseTLB type, and then inside the ISA for them to
cast that to their architecture specific type to call into architecture
specific interfaces.
The ARM TLB needed the most adjustment since it was using non-standard
translation function signatures. Specifically, they all took an extra
"type" parameter which defaulted to normal, and translateTiming
returned a Fault. translateTiming actually doesn't need to return a
Fault because everywhere that consumed it just stored it into a
structure which it then deleted(?), and the fault is stored in the
Translation object when the translation is done.
A little more work is needed to fully obviate the arch/tlb.hh header,
so the TheISA::TLB type is still visible outside of the ISAs.
Specifically, the TlbEntry type is used in the generic PageTable which
lives in src/mem.
Change-Id: I51b68ee74411f9af778317eff222f9349d2ed575
Reviewed-on: https://gem5-review.googlesource.com/6921
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
|
Neither of these were used, particularly memAccInst.
Change-Id: I4ac9e44cf624e5de42519d586d7b699f08a2cdfc
Reviewed-on: https://gem5-review.googlesource.com/6601
Maintainer: Gabe Black <gabeblack@google.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
These files aren't a collection of miscellaneous stuff, they're the
definition of the Logger interface, and a few utility macros for
calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1
Reviewed-on: https://gem5-review.googlesource.com/6226
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
In the ISA instruction definitions, some classes were declared with
execute, etc., functions outside of the main template because they
had CPU specific signatures and would need to be duplicated with
each CPU plugged into them. Now that the instructions always just
use an ExecContext, there's no reason for those templates to be
separate. This change folds those templates together.
Change-Id: I13bda247d3d1cc07c0ea06968e48aa5b4aace7fa
Reviewed-on: https://gem5-review.googlesource.com/5401
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Alec Roelke <ar4jc@virginia.edu>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
The ISA parser used to generate different copies of exec functions
for each exec context class a particular CPU wanted to use. That's
since been changed so that those functions take a pointer to the base
ExecContext, so the code which would generate those extra functions
can be removed, and some functions which used to be templated on an
ExecContext subclass can be untemplated, or minimally less templated.
Now that some functions aren't going to be instantiated multiple times
with different signatures, there are also opportunities to collapse
templates and make many instruction definitions simpler within the
parser. Since those changes will be less mechanical, they're left for
later changes and will probably be done in smaller increments.
Change-Id: I0015307bb02dfb9c60380b56d2a820f12169ebea
Reviewed-on: https://gem5-review.googlesource.com/5381
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Generating dependency/build product information in the isa parser breaks scons
idea of how a build is supposed to work. Arm twisting it into working forced
a lot of false dependencies which slowed down the build.
Change-Id: Iadee8c930fd7c80136d200d69870df7672a6b3ca
Reviewed-on: https://gem5-review.googlesource.com/5081
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
ARM systems require the coordination of the global and local
monitors. When the system is run without caches the global monitor is
implemented in the abstract memory object. This change adds a callback
from the abstract memory that notifies the local monitor when the
global monitor is cleared.
Additionally, for ARM systems the local monitor signals the event
register and wakes the thread context up. Subsequent wait-for-event
(WFE) instructions will be immediately signaled.
Change-Id: If6c038f3a6bea7239ba4258f07f39c7f9a30500b
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/3760
Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
|
The kernel stat mechanism should really be refactored and moved somewhere
else, but in the mean time there's some old cruft that can be cleared away.
Change-Id: I21e725de590dda0d20bf3bc675bbe976c7b1bd86
Reviewed-on: https://gem5-review.googlesource.com/4600
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: I09570e569efe55f5502bc201e03456738999e714
Signed-off-by: Sean Wilson <spwilson2@wisc.edu>
Reviewed-on: https://gem5-review.googlesource.com/3920
Maintainer: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
|
This patch adds some more functionality to the cpu model and the arch to
interface with the vector register file.
This change consists mainly of augmenting ThreadContexts and ExecContexts
with calls to get/set full vectors, underlying microarchitectural elements
or lanes. Those are meant to interface with the vector register file. All
classes that implement this interface also get an appropriate implementation.
This requires implementing the vector register file for the different
models using the VecRegContainer class.
This change set also updates the Result abstraction to contemplate the
possibility of having a vector as result.
The changes also affect how the remote_gdb connection works.
There are some (nasty) side effects, such as the need to define dummy
numPhysVecRegs parameter values for architectures that do not implement
vector extensions.
Nathanael Premillieu's work with an increasing number of fixes and
improvements of mine.
Change-Id: Iee65f4e8b03abfe1e94e6940a51b68d0977fd5bb
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues and CC reg free list initialisation ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2705
|
|
With the hierarchical RegId there are a lot of functions that are
redundant now.
The idea behind the simplification is that instead of having the regId,
telling which kind of register read/write/rename/lookup/etc. and then
the function panic_if'ing if the regId is not of the appropriate type,
we provide an interface that decides what kind of register to read
depending on the register type of the given regId.
Change-Id: I7d52e9e21fc01205ae365d86921a4ceb67a57178
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2702
|
|
Replace the unified register mapping with a structure associating
a class and an index. It is now much easier to know which class of
register the index is referring to. Also, when adding a new class
there is no need to modify existing ones.
Change-Id: I55b3ac80763702aa2cd3ed2cbff0a75ef7620373
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
[ Fix RISCV build issues ]
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/2700
|
|
Compiling gem5 with recent version of clang (4 and 5) triggers
warnings that are treated as errors:
* Global templatized static functions result in a warning if they
are not used. These should either be declared as static inline or
without the static identifier to avoid the warning.
* Some templatized classes contain static variables. The
instantiated versions of these variables / templates need to be
explicitly declared to avoid a compiler warning.
Change-Id: Ie8261144836e94ebab7ea04ccccb90927672c257
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Reviewed-on: https://gem5-review.googlesource.com/3420
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
|
|
The new version modularizes the implementation of the various commands,
gets rid of dynamic allocation of the register cache, fixes some small
style problems, and uses exceptions to simplify error handling internal to
the GDB stub.
Change-Id: Iff3548373ce4adfb99106a810f5713b769df89b2
Reviewed-on: https://gem5-review.googlesource.com/3280
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Boris Shingarov <shingarov@gmail.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
The Process class is full of implementation details and
structures related to SE Mode. This changeset factors out an
internal class from Process and moves it into a separate file.
The purpose behind doing this is to clean up the code and make
it a bit more modular.
Change-Id: Ic6941a1657751e8d51d5b6b1dcc04f1195884280
Reviewed-on: https://gem5-review.googlesource.com/2263
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
simulations
Modifies the clone system call and adds execve system call. Requires allowing
processes to steal thread contexts from other processes in the same system
object and the ability to detach pieces of process state (such as MemState)
to allow dynamic sharing.
|
|
This changeset adds functionality that allows system calls to retry without
affecting thread context state such as the program counter or register values
for the associated thread context (when system calls return with a retry
fault).
This functionality is needed to solve problems with blocking system calls
in multi-process or multi-threaded simulations where information is passed
between processes/threads. Blocking system calls can cause deadlock because
the simulator itself is single threaded. There is only a single thread
servicing the event queue which can cause deadlock if the thread hits a
blocking system call instruction.
To illustrate the problem, consider two processes using the producer/consumer
sharing model. The processes can use file descriptors and the read and write
calls to pass information to one another. If the consumer calls the blocking
read system call before the producer has produced anything, the call will
block the event queue (while executing the system call instruction) and
deadlock the simulation.
The solution implemented in this changeset is to recognize that the system
calls will block and then generate a special retry fault. The fault will
be sent back up through the function call chain until it is exposed to the
cpu model's pipeline where the fault becomes visible. The fault will trigger
the cpu model to replay the instruction at a future tick where the call has
a chance to succeed without actually going into a blocking state.
In subsequent patches, we recognize that a syscall will block by calling a
non-blocking poll (from inside the system call implementation) and checking
for events. When events show up during the poll, it signifies that the call
would not have blocked and the syscall is allowed to proceed (calling an
underlying host system call if necessary). If no events are returned from the
poll, we generate the fault and try the instruction for the thread context
at a distant tick. Note that retrying every tick is not efficient.
As an aside, the simulator has some multi-threading support for the event
queue, but it is not used by default and needs work. Even if the event queue
was completely multi-threaded, meaning that there is a hardware thread on
the host servicing a single simulator thread contexts with a 1:1 mapping
between them, it's still possible to run into deadlock due to the event queue
barriers on quantum boundaries. The solution of replaying at a later tick
is the simplest solution and solves the problem generally.
|
|
Moves aux_vector into its own .hh and .cc files just to get it out of the
already crowded Process files. Arguably, it could stay there, but it's
probably better just to move it and give it files.
The changeset looks ugly around the Process header file, but the goal here is
to move methods and members around so that they're not defined randomly
throughout the entire header file. I expect this is likely one of the reasons
why I several unused variables related to this class. So, the methods are
declared first followed by members. I've tried to aggregate them together
so that similar entries reside near one another.
There are other changes coming to this code so this is by no means the
final product.
|
|
The EIOProcess class was removed recently and it was the only other class
which derived from Process. Since every Process invocation is also a
LiveProcess invocation, it makes sense to simplify the organization by
combining the fields from LiveProcess into Process.
|
|
Used cppclean to help identify useless includes and removed them. This
involved erroneously included headers, but also cases where forward
declarations could have been used rather than a full include.
|
|
The class was crammed into syscall_emul.hh which has tons of forward
declarations and template definitions. To clean it up a bit, moved the
class into separate files and commented the class with doxygen style
comments. Also, provided some encapsulation by adding some accessors and
a mutator.
The syscallreturn.hh file was renamed syscall_return.hh to make it consistent
with other similarly named files in the src/sim directory.
The DPRINTF_SYSCALL macro was moved into its own header file with the
include the Base and Verbose flags as well.
--HG--
rename : src/sim/syscallreturn.hh => src/sim/syscall_return.hh
|
|
|
|
Make it so that getInterrupt *always* returns an interrupt if
checkInterrupts() returns true. This fixes/simplifies handling
of interrupts on the SMT FS CPUs (currently minor).
|
|
Fixing an issue with regStats not calling the parent class method
for most SimObjects in Gem5. This causes issues if one adds new
stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
|
|
|
|
Add 4 power states to the ClockedObject, provides necessary access functions
to check and update the power state. Default power state is UNDEFINED, it is
responsibility of the respective simulation model to provide the startup state
and any other logic for state change.
Add number of transition stat.
Add distribution of time spent in clock gated state.
Add power state residency stat.
Add dump call back function to allow stats update of distribution and residency
stats.
|
|
After all this it turns out we don't even use it.
|
|
The openFlagTable and mmapFlagTables for emulated Linux
platforms are basically identical, but are specified
repetitively for every platform. Use a common file
that gets included for each platform so that we only
have one copy, making them more consistent and simplifying
changes (like adding #ifdefs).
In the process, made some minor fixes that slipped through
due to previous inconsistencies, and added more #ifdefs
to try to fix building on alternative hosts.
|
|
|
|
The mmapGrowsDown() method was a static method on the OperatingSystem
class (and derived classes), which worked OK for the templated syscall
emulation methods, but made it hard to access elsewhere. This patch
moves the method to be a virtual function on the LiveProcess method,
where it can be overridden for specific platforms (for now, Alpha).
This patch also changes the value of mmapGrowsDown() from being false
by default and true only on X86Linux32 to being true by default and
false only on Alpha, which seems closer to reality (though in reality
most people use ASLR and this doesn't really matter anymore).
In the process, also got rid of the unused mmap_start field on
LiveProcess and OperatingSystem mmapGrowsUp variable.
|
|
For O3, which has a stat that counts reg reads, there is an additional
reg read per mmap() call since there's an arg we no longer ignore.
Otherwise, stats should not be affected.
|
|
|
|
The structure definition only had the open system call flag set in mind when
it was named, so we rename it here with the intention of using it to define
additional tables to translate flags for other system calls in the future.
|
|
Make clang happy...again.
|
|
Result of running 'hg m5style --skip-all --fix-control -a'.
|
|
Result of running 'hg m5style --skip-all --fix-white -a'.
|
|
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.
This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.
This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().
For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level. For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
|
|
|
|
Make best use of the compiler, and enable -Wextra as well as
-Wall. There are a few issues that had to be resolved, but they are
all trivial.
|
|
Currently, the wire format of register values in g- and G-packets is
modelled using a union of uint8/16/32/64 arrays. The offset positions
of each register are expressed as a "register count" scaled according
to the width of the register in question. This results in counter-
intuitive and error-prone "register count arithmetic", and some
formats would even be altogether unrepresentable in such model, e.g.
a 64-bit register following a 32-bit one would have a fractional index
in the regs64 array.
Another difficulty is that the array is allocated before the actual
architecture of the workload is known (and therefore before the correct
size for the array can be calculated).
With this patch I propose a simpler mechanism for expressing the
register set structure. In the new code, GdbRegCache is an abstract
class; its subclasses contain straightforward structs reflecting the
register representation. The determination whether to use e.g. the
AArch32 vs. AArch64 register set (or SPARCv8 vs SPARCv9, etc.) is made
by polymorphically dispatching getregs() to the concrete subclass.
The subclass is not instantiated until it is needed for actual
g-/G-packet processing, when the mode is already known.
This patch is not meant to be merged in on its own, because it changes
the contract between src/base/remote_gdb.* and src/arch/*/remote_gdb.*,
so as it stands right now, it would break the other architectures.
In this patch only the base and the ARM code are provided for review;
once we agree on the structure, I will provide src/arch/*/remote_gdb.*
for the other architectures; those patches could then be merged in
together.
Review Request: http://reviews.gem5.org/r/3207/
Pushed by Joel Hestness <jthestness@gmail.com>
|
|
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
|
|
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
|
|
|
|
This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.
|
|
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.
* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).
* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.
* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
|
|
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:
* Speculation: CPU models are not allowed to issue speculative
loads.
* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.
Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
|
|
Finally took the plunge and made this apply to all ISAs, not just ARM.
|