Age | Commit message (Collapse) | Author |
|
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture.
Tested with a simple program in assembly language which executes the
instruction. Checked that different versions of the instruction are executed
by using the execution tracing option.
Committed by: Nilay Vaish <nilay@cs.wisc.edu
|
|
Instead of counting the number of opcode bytes in an instruction and recording
each byte before the actual opcode, we can represent the path we took to get to
the actual opcode byte by using a type code. That has a couple of advantages.
First, we can disambiguate the properties of opcodes of the same length which
have different properties. Second, it reduces the amount of data stored in an
ExtMachInst, making them slightly easier/faster to create and process. This
also adds some flexibility as far as how different types of opcodes are
handled, which might come in handy if we decide to support VEX or XOP
instructions.
This change also adds tables to support properly decoding 3 byte opcodes.
Before we would fall off the end of some arrays, on top of the ambiguity
described above.
This change doesn't measureably affect performance on the twolf benchmark.
--HG--
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa
rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
|
|
The data size used for actually writing the base value for the segment was the
default size, but really it should set the entire value without any possible
truncation.
|
|
The far pointer should be shifted right to get the selector value, not left.
Also, when calculating the width of the offset, the wrong register was used in
one spot.
|
|
Mwait works as follows:
1. A cpu monitors an address of interest (monitor instruction)
2. A cpu calls mwait - this loads the cache line into that cpu's cache.
3. The cpu goes to sleep.
4. When another processor requests write permission for the line, it is
evicted from the sleeping cpu's cache. This eviction is forwarded to the
sleeping cpu, which then wakes up.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
|
|
The o3 cpu relies upon instructions that suspend a thread context being
flagged as "IsQuiesce". If they are not, unpredictable behavior can occur.
This patch fixes that for the x86 ISA.
|
|
This patch sets op class of two fp instructions: movfp and pop x87 stack
as IntAluOp since these instructions do not make use of the fp alu.
|
|
This patch encompasses several interrelated and interdependent changes
to the ISA generation step. The end goal is to reduce the size of the
generated compilation units for instruction execution and decoding so
that batch compilation can proceed with all CPUs active without
exhausting physical memory.
The ISA parser (src/arch/isa_parser.py) has been improved so that it can
accept 'split [output_type];' directives at the top level of the grammar
and 'split(output_type)' python calls within 'exec {{ ... }}' blocks.
This has the effect of "splitting" the files into smaller compilation
units. I use air-quotes around "splitting" because the files themselves
are not split, but preprocessing directives are inserted to have the same
effect.
Architecturally, the ISA parser has had some changes in how it works.
In general, it emits code sooner. It doesn't generate per-CPU files,
and instead defers to the C preprocessor to create the duplicate copies
for each CPU type. Likewise there are more files emitted and the C
preprocessor does more substitution that used to be done by the ISA parser.
Finally, the build system (SCons) needs to be able to cope with a
dynamic list of source files coming out of the ISA parser. The changes
to the SCons{cript,truct} files support this. In broad strokes, the
targets requested on the command line are hidden from SCons until all
the build dependencies are determined, otherwise it would try, realize
it can't reach the goal, and terminate in failure. Since build steps
(i.e. running the ISA parser) must be taken to determine the file list,
several new build stages have been inserted at the very start of the
build. First, the build dependencies from the ISA parser will be emitted
to arch/$ISA/generated/inc.d, which is then read by a new SCons builder
to finalize the dependencies. (Once inc.d exists, the ISA parser will not
need to be run to complete this step.) Once the dependencies are known,
the 'Environments' are made by the makeEnv() function. This function used
to be called before the build began but now happens during the build.
It is easy to see that this step is quite slow; this is a known issue
and it's important to realize that it was already slow, but there was
no obvious cause to attribute it to since nothing was displayed to the
terminal. Since new steps that used to be performed serially are now in a
potentially-parallel build phase, the pathname handling in the SCons scripts
has been tightened up to deal with chdir() race conditions. In general,
pathnames are computed earlier and more likely to be stored, passed around,
and processed as absolute paths rather than relative paths. In the end,
some of these issues had to be fixed by inserting serializing dependencies
in the build.
Minor note:
For the null ISA, we just provide a dummy inc.d so SCons is never
compelled to try to generate it. While it seems slightly wrong to have
anything in src/arch/*/generated (i.e. a non-generated 'generated' file),
it's by far the simplest solution.
|
|
With (upcoming) separate compilation, they are useless. Only
link-time optimization could re-inline them, but ideally
feedback-directed optimization would choose to do so only for
profitable (i.e. common) instructions.
|
|
|
|
|
|
|
|
|
|
This is an implementation of the x86 int3 and int immediate
instructions for long mode according to 'AMD64 Programmers Manual
Volume 3'.
|
|
Convert condition code registers from being specialized
("pseudo") integer registers to using the recently
added CC register class.
Nilay Vaish also contributed to this patch.
|
|
|
|
|
|
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.
80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
|
|
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
|
|
The current implementation of the x87 never updates the x87 tag
word. This is currently not a big issue since the simulated x87 never
checks for stack overflows, however this becomes an issue when
switching between a virtualized CPU and a simulated CPU. This
changeset adds support, which is enabled by default, for updating the
tag register to every floating point microop that updates the stack
top using the spm mechanism.
The new tag words is generated by the helper function
X86ISA::genX87Tags(). This function is currently limited to flagging a
stack position as valid or invalid and does not try to distinguish
between the valid, zero, and special states.
|
|
This changeset actually fixes two issues:
* The lfpimm instruction didn't work correctly when applied to a
floating point constant (it did work for integers containing the
bit string representation of a constant) since it used
reinterpret_cast to convert a double to a uint64_t. This caused a
compilation error, at least, in gcc 4.6.3.
* The instructions loading floating point constants in the x87
processor didn't work correctly since they just stored a truncated
integer instead of a double in the floating point register. This
changeset fixes the old microcode by using lfpimm instruction
instead of the limm instructions.
|
|
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
|
|
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
|
|
Currently call and return instructions are marked as IsCall and IsReturn. Thus, the
branch predictor does not use RAS for these instructions. Similarly, the number of
function calls that took place is recorded as 0. This patch marks these instructions
as they should be.
|
|
Currently all the integer microops are marked as IntAluOp and the floating
point microops are marked as FloatAddOp. This patch adds support for marking
different microops differently. Now IntMultOp, IntDivOp, FloatDivOp,
FloatMultOp, FloatCvtOp, FloatSqrtOp classes will be used as well. This will
help in providing different latencies for different op class.
|
|
The 'lret' instruction reloads instruction pointer and code segment from the
stack and then pops them. But the popping part is missing from the current
implementation. This caused incorrect behavior in some code related to the
Fiasco OS. Microops are being added to rectify the behavior of the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
|
|
This patch fixes the warnings that clang3.2svn emit due to the "-Wall"
flag. There is one case of an uninitialised value in the ARM neon ISA
description, and then a whole range of unused private fields that are
pruned.
|
|
|
|
|
|
|
|
Used as a command in full-system scripts helps the user ensure the benchmarks have finished successfully.
For example, one can use:
/path/to/benchmark args || /sbin/m5 fail 1
and thus ensure gem5 will exit with an error if the benchmark fails.
|
|
This patch implements the fnstsw instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
|
|
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
|
|
The patch introduces two predicates for condition code registers -- one
tests if a register needs to be read, the other tests whether a register
needs to be written to. These predicates are evaluated twice -- during
construction of the microop and during its execution. Register reads
and writes are elided depending on how the predicates evaluate.
|
|
The D flag bit is part of the cc flag bit register currently. But since it
is not being used any where in the implementation, it creates an unnecessary
dependency. Hence, it is being moved to a separate register.
|
|
The CPUID instruction was implemented so that it would only write its results
if the instruction was successful. This works fine on the simple CPU where
unwritten registers retain their old values, but on a CPU like O3 with
renaming this is broken. The instruction needs to write the old values back
into the registers explicitly if they aren't being changed.
|
|
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.
--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
|
|
|
|
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.
|
|
Shuffle the 32 bit values into position, and then add in parallel.
|
|
The disp displacement was left off the load microop so the wrong value was
used.
|
|
This patch addresses a number of minor issues that cause problems when
compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it
avoids using the deprecated ext/hash_map and instead uses
unordered_map (and similarly so for the hash_set). To make use of the
new STL containers, g++ and clang has to be invoked with "-std=c++0x",
and this is now added for all gcc versions >= 4.6, and for clang >=
3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1
unordered_map to avoid the deprecation warning.
The addition of c++0x in turn causes a few problems, as the
compiler is more stringent and adds a number of new warnings. Below,
the most important issues are enumerated:
1) the use of namespaces is more strict, e.g. for isnan, and all
headers opening the entire namespace std are now fixed.
2) another other issue caused by the more stringent compiler is the
narrowing of the embedded python, which used to be a char array,
and is now unsigned char since there were values larger than 128.
3) a particularly odd issue that arose with the new c++0x behaviour is
found in range.hh, where the operator< causes gcc to complain about
the template type parsing (the "<" is interpreted as the beginning
of a template argument), and the problem seems to be related to the
begin/end members introduced for the range-type iteration, which is
a new feature in c++11.
As a minor update, this patch also fixes the build flags for the clang
debug target that used to be shared with gcc and incorrectly use
"-ggdb".
|
|
Virtual (pre-segmentation) addresses are truncated based on address size, and
any non-64 bit linear address is truncated to 32 bits. This means that real
mode addresses aren't truncated down to 16 bits after their segment bases are
added in.
|
|
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
|
|
This patch makes the code compile with clang 2.9 and 3.0 again by
making two very minor changes. Firt, it maintains a strict typing in
the forward declaration of the BaseCPUParams. Second, it adds a
FullSystemInt flag of the type unsigned int next to the boolean
FullSystem flag. The FullSystemInt variable can be used in
decode-statements (expands to switch statements) in the instruction
decoder.
|
|
If an instruction is executed speculatively and hits a situation where it
wants to panic, it should return a fault instead. If the instruction was
misspeculated, the fault can be thrown away. If the instruction wasn't
misspeculated, the fault will be invoked and the panic will still happen.
|
|
|
|
|