Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
UBSAN flags this operation because it detects that arg is being cast directly
to an unsigned type, argBits. this patch fixes this by first casting the
value to a signed int type, then reintrepreting the raw bits of the signed
int into argBits.
|
|
|
|
fixes to appease clang++. tested on:
Ubuntu clang version 3.5.0-4ubuntu2~trusty2
(tags/RELEASE_350/final) (based on LLVM 3.5.0)
Ubuntu clang version 3.6.0-2ubuntu1~trusty1
(tags/RELEASE_360/final) (based on LLVM 3.6.0)
the fixes address the following five issues:
1) the exec continuations in gpu_static_inst.hh were marked
as protected when they should be public. here we mark
them as public
2) the Abs instruction uses std::abs() in its execute method.
because Abs is templated, it can also operate on U32 and U64,
types, which cause Abs::execute() to pass uint32_t and uint64_t
types to std::abs() respectively. this triggers a warning
because std::abs() has no effect in this case. to rememdy this
we add template specialization for the execute() method of Abs
when its template paramter is U32 or U64.
3) Some potocols that utilize the code in cprintf.hh were missing
includes to BoolVec.hh, which defines operator<< for the BoolVec
type. This would cause issues when the generated code would try
to pass a BoolVec type to a method in cprintf.hh that used
operator<< on an instance of a BoolVec.
4) Surprise, clang doesn't like it when you clobber all the bits
in a newly allocated object. I.e., this code:
tlb = new GpuTlbEntry\[size\];
std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size);
Let's use std::vector to track the TLB entries in the GpuTlb now...
5) There were a few variables used only in DPRINTFs, so we mark them
with M5_VAR_USED.
|
|
This patch adds the ability for an application to request dist-gem5 to begin/
end synchronization using an m5 op. When toggling on sync, all nodes agree
on the next sync point based on the maximum of all nodes' ticks. CPUs are
suspended until the sync point to avoid sending network messages until sync has
been enabled. Toggling off sync acts like a global execution barrier, where
all CPUs are disabled until every node reaches the toggle off point. This
avoids tricky situations such as one node hitting a toggle off followed by a
toggle on before the other nodes hit the first toggle off.
|
|
Adding details, e.g. rip, rsp etc. to the kvm pagefault exit when in SE mode.
|
|
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects. The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects. Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.
|
|
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
this patch fixes issues with changeset 11593
use the host's pwrite() syscall for pwrite64Func(),
as opposed to pwrite64(), because pwrite64() does
not work well on all distros.
undo the enabling of fstatfs, as we will add this
in a separate pate.
|
|
this patch adds an implementation for the pwrite64 syscall and
enables it for x86_64, and enables fstatfs for x86_64.
|
|
According to the Intel Multi Processor Specification rev 1.4 (-006) (*),
section 4.3.2 Bus Entries, Bus type strings are >>6-character ASCII
(blank-filled) strings<<.
This patch properly pads the entries with the missing spaces at the end.
(*) http://www.intel.com/design/pentium/datashts/24201606.pdf
Committed by Jason Lowe-Power <power.jg@gmail.com>
|
|
Registers are 0x10 and not 0x8 apart. The latter leads to invalid
calculations of index in array which in turn means that we will not
find the interrupt we were looking (been notified) for in the OS.
Committed by Jason Lowe-Power <power.jg@gmail.com>
|
|
After all this it turns out we don't even use it.
|
|
The openFlagTable and mmapFlagTables for emulated Linux
platforms are basically identical, but are specified
repetitively for every platform. Use a common file
that gets included for each platform so that we only
have one copy, making them more consistent and simplifying
changes (like adding #ifdefs).
In the process, made some minor fixes that slipped through
due to previous inconsistencies, and added more #ifdefs
to try to fix building on alternative hosts.
|
|
|
|
|
|
The mmapGrowsDown() method was a static method on the OperatingSystem
class (and derived classes), which worked OK for the templated syscall
emulation methods, but made it hard to access elsewhere. This patch
moves the method to be a virtual function on the LiveProcess method,
where it can be overridden for specific platforms (for now, Alpha).
This patch also changes the value of mmapGrowsDown() from being false
by default and true only on X86Linux32 to being true by default and
false only on Alpha, which seems closer to reality (though in reality
most people use ASLR and this doesn't really matter anymore).
In the process, also got rid of the unused mmap_start field on
LiveProcess and OperatingSystem mmapGrowsUp variable.
|
|
|
|
For O3, which has a stat that counts reg reads, there is an additional
reg read per mmap() call since there's an arg we no longer ignore.
Otherwise, stats should not be affected.
|
|
|
|
The structure definition only had the open system call flag set in mind when
it was named, so we rename it here with the intention of using it to define
additional tables to translate flags for other system calls in the future.
|
|
This patch implements the clock_getres() system call for arm and x86 in linux
SE mode.
|
|
The previous implementation did a pair of nested RMW operations,
which isn't compatible with the way that locked RMW operations are
implemented in the cache models. It was convenient though in that
it didn't require any new micro-ops, and supported cmpxchg16b using
64-bit memory ops. It also worked in AtomicSimpleCPU where
atomicity was guaranteed by the core and not by the memory system.
It did not work with timing CPU models though.
This new implementation defines new 'split' load and store micro-ops
which allow a single memory operation to use a pair of registers as
the source or destination, then uses a single ldsplit/stsplit RMW
pair to implement cmpxchg. This patch requires support for 128-bit
memory accesses in the ISA (added via a separate patch) to support
cmpxchg16b.
|
|
Although the cache models support wider accesses, the ISA descriptions
assume that (for the most part) memory operands are integer types,
which makes it difficult to define instructions that do memory accesses
larger than 64 bits.
This patch adds some generic support for memory operands that are arrays
of uint64_t, and specifically a 'u2qw' operand type for x86 that is an
array of 2 uint64_ts (128 bits). This support is unused at this point,
but will be needed shortly for cmpxchg16b. Ideally the 128-bit SSE
memory accesses will also be rewritten to use this support.
Support for 128-bit accesses could also have been added using the gcc
__int128_t extension, which would have been less disruptive. However,
although clang also supports __int128_t, it's still non-standard.
Also, more importantly, this approach creates a path to defining
256- and 512-byte operands as well, which will be useful for eventual
AVX support.
|
|
Writing 16 bytes from an 8-byte source value is a bad idea.
This doesn't appear to have broken anything, but showed up
as spurious differences when tracediffing runs.
|
|
In the process of trying to get rid of an '== false' comparison,
it became apparent that a slightly more involved solution was
needed. Split this out into its own changeset since it's not
a totally trivial local change like the others.
|
|
Result of running 'hg m5style --skip-all --fix-control -a'.
|
|
Result of running 'hg m5style --skip-all --fix-white -a'.
|
|
For historical reasons, the ExecContext interface had a single
function, readMem(), that did two different things depending on
whether the ExecContext supported atomic memory mode (i.e.,
AtomicSimpleCPU) or timing memory mode (all the other models).
In the former case, it actually performed a memory read; in the
latter case, it merely initiated a read access, and the read
completion did not happen until later when a response packet
arrived from the memory system.
This led to some confusing things, including timing accesses
being required to provide a pointer for the return data even
though that pointer was only used in atomic mode.
This patch splits this interface, adding a new initiateMemRead()
function to the ExecContext interface to replace the timing-mode
use of readMem().
For consistency and clarity, the readMemTiming() helper function
in the ISA definitions is renamed to initiateMemRead() as well.
For x86, where the access size is passed in explicitly, we can
also get rid of the data parameter at this level. For other ISAs,
where the access size is determined from the type of the data
parameter, we have to keep the parameter for that purpose.
|
|
The readMemAtomic/writeMemAtomic helper functions were calling
readMemTiming/writeMemTiming respectively. This is functionally
correct, since the *Timing functions are doing the same access
initiation operation as the *Atomic functions (just that the
*Atomic versions also complete the access in line). It also
provides for some (very minimal) code reuse. Unfortunately,
it's potentially pretty confusing, since it makes it look like
the atomic accesses are somehow being converted to timing
accesses. It also gets in the way of specializing the timing
interface (as will be done in a future patch).
|
|
|
|
The key parameter can be used to read out various config parameters from
within the simulated software.
|
|
Currently, the wire format of register values in g- and G-packets is
modelled using a union of uint8/16/32/64 arrays. The offset positions
of each register are expressed as a "register count" scaled according
to the width of the register in question. This results in counter-
intuitive and error-prone "register count arithmetic", and some
formats would even be altogether unrepresentable in such model, e.g.
a 64-bit register following a 32-bit one would have a fractional index
in the regs64 array.
Another difficulty is that the array is allocated before the actual
architecture of the workload is known (and therefore before the correct
size for the array can be calculated).
With this patch I propose a simpler mechanism for expressing the
register set structure. In the new code, GdbRegCache is an abstract
class; its subclasses contain straightforward structs reflecting the
register representation. The determination whether to use e.g. the
AArch32 vs. AArch64 register set (or SPARCv8 vs SPARCv9, etc.) is made
by polymorphically dispatching getregs() to the concrete subclass.
The subclass is not instantiated until it is needed for actual
g-/G-packet processing, when the mode is already known.
This patch is not meant to be merged in on its own, because it changes
the contract between src/base/remote_gdb.* and src/arch/*/remote_gdb.*,
so as it stands right now, it would break the other architectures.
In this patch only the base and the ARM code are provided for review;
once we agree on the structure, I will provide src/arch/*/remote_gdb.*
for the other architectures; those patches could then be merged in
together.
Review Request: http://reviews.gem5.org/r/3207/
Pushed by Joel Hestness <jthestness@gmail.com>
|
|
As per the x86 architecture specification, matching TLB entries need to be
invalidated on a page fault. For instance, after a page fault due to inadequate
protection bits on a TLB hit, the TLB entry needs to be invalidated. This
behavior is clearly specified in the x86 architecture manuals from both AMD and
Intel. This invalidation is missing currently in gem5, due to which linux
kernel versions 3.8 and up cannot be simulated efficiently. This is exposed by
a linux optimisation in commit e4a1cc56e4d728eb87072c71c07581524e5160b1, which
removes a tlb flush on updating page table entries in x86.
Testing: Linux kernel versions 3.8 onwards were booting very slowly in FS mode,
due to repeated page faults (~300000 before the first print statement in a
bash file). Ensured that page fault rate drops drastically and observed
reduction in boot time from order of hours to minutes for linux kernel v3.8
and v3.11
|
|
doCpuid() has to identical warn messages about unimplemented functions. Add
the family to the log message to make them distinguishable.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
|
|
Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
|
|
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
|
|
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
|
|
These are packed single-precision approximate reciprocal operations,
vector and scalar versions, respectively.
This code was basically developed by copying the code for
sqrtps and sqrtss. The mrcp micro-op was simplified relative to
msqrt since there are no double-precision versions of this operation.
|
|
fild loads an integer value into the x87 top of stack register.
fucomi/fucomip compare two x87 register values (the latter
also doing a stack pop).
These instructions are used by some versions of GNU libstdc++.
|
|
Changes wakeup functionality so that only specific threads on SMT
capable cpus are woken.
|
|
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
|
|
Added explicit data sizes and an opcode type for correct execution.
|
|
This patch implements the correct behavior.
|
|
|
|
This adds a vector register type. The type is defined as a std::array of a
fixed number of uint64_ts. The isa_parser.py has been modified to parse vector
register operands and generate the required code. Different cpus have vector
register files now.
|
|
This patch updates the x86 decoder so that it can decode instructions with vex
prefix. It also updates the isa with opcodes from vex opcode maps 1, 2 and 3.
Note that none of the instructions have been implemented yet. The
implementations would be provided in due course of time.
|