Age | Commit message (Collapse) | Author |
|
These are non-temporal packed SSE stores.
Change-Id: I526cd6551b38d6d35010bc6173f23d017106b466
Reviewed-on: https://gem5-review.googlesource.com/9861
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
These instructions originally read the TSC into t1 and then unpacked it
into eax and edx using a move, a right shift, and then another move.
We can combine the second shift and move. The shift will move the
upper 32 bits into the lower 32 bits, and clear the upper 32 bits to
zero. This has the same effect as moving the lower 32 bits post-shift
into another register, since the upper 32 bits will be cleared to zero
based on x86 partial register access semantics.
Change-Id: Iba85e501c7e84147ad0047f5c555e61bdf8f032b
Reviewed-on: https://gem5-review.googlesource.com/9044
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
This is very similar to RDTSC, except that it requires all younger
instructions to retire before it completes, and it writes the TSC_AUX
MSR into ECX. I've added an mfence as an iniitial microop to ensure
that memory accesses complete before RDTSCP runs, and added an rdval
microop at the end to read the TSC_AUX value into ECX.
Change-Id: I9766af562b7fd0c22e331b56e06e8818a9e268c9
Reviewed-on: https://gem5-review.googlesource.com/9043
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
Change-Id: I20bf6a57ea4354aac9267845bb37b70b83d6fcde
Reviewed-on: https://gem5-review.googlesource.com/9042
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
This makes it explicit which type of serialization you want, and also
makes it possible to make a macroop serialize before. The old
serializing directive was renamed .serialize_after in the microcode
assembler, and throughout the microcode implementation, and its
behavior is unchanged. More specifically, it still marks the last
microop within the macroop as IsSerializing and IsSerializeAfter.
The new .serialize_before directive does something similar and marks
the first microop as IsSerializing and IsSerializeBefore.
Change-Id: Ia53466c734c651c65400809de7ef903c4a6c3e7e
Reviewed-on: https://gem5-review.googlesource.com/9041
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
This patch adds support for cache flushing instructions in x86.
It piggybacks on support for similar instructions in arm ISA
added by Nikos Nikoleris. I have tested each instruction using
microbenchmarks.
Change-Id: I72b6b8dc30c236a21eff7958fa231f0663532d7d
Reviewed-on: https://gem5-review.googlesource.com/7401
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
This means the instruction is treated as cmpxchg8b when the effective
operand size is 16 bits.
Change-Id: I4d9bb295f96097e1746a9bbccb2c579d14738fab
Reviewed-on: https://gem5-review.googlesource.com/6603
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
The FSW and TOP values are technically part of the same register, but
they have very different behaviors. One of them can be renamed and
float along without affecting global state, while the other requires
serialization. They just need to *look* like the same register when
read by the user.
Also, there was a missing break in setMiscRegNoEffect.
Change-Id: If58de0f566f65068208240f4001209fb9e1826d6
Reviewed-on: https://gem5-review.googlesource.com/6441
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
The microcode for those instructions needs a directive which overrides
that setting in the instructions emulation environment.
Reported-by: Matt Sinclair <mattdsinclair@gmail.com>
Change-Id: I474d938c0b3cf01da92ec817a58b08de783f1967
Reviewed-on: https://gem5-review.googlesource.com/6301
Reviewed-by: Gabe Black <gabeblack@google.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
When in 64-bit mode, if the stack is accessed implicitly by an instruction
the alternate address prefix should be ignored if present.
This patch adds an extra flag to the ldstop which signifies when the
address override should be ignored. Then, for all of the affected
instructions, this patch adds two options to the ld and st opcode to use
the current stack addressing mode for all addresses and to ignore the
AddressSizeFlagBit. Finally, this patch updates the x86 TLB to not
truncate the address if it is in 64-bit mode and the IgnoreAddrSizeFlagBit
is set.
This fixes a problem when calling __libc_start_main with a binary that is
linked with a recent version of ld. This version of ld uses the address
override prefix (0x67) on the call instruction instead of a nop.
Note: This has not been tested in compatibility mode and only the call
instruction with the address override prefix has been tested.
See [1] page 9 (pdf page 45)
For instructions that are affected see [1] page 519 (pdf page 555).
[1] http://support.amd.com/TechDocs/24594.pdf
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
The previous implementation did a pair of nested RMW operations,
which isn't compatible with the way that locked RMW operations are
implemented in the cache models. It was convenient though in that
it didn't require any new micro-ops, and supported cmpxchg16b using
64-bit memory ops. It also worked in AtomicSimpleCPU where
atomicity was guaranteed by the core and not by the memory system.
It did not work with timing CPU models though.
This new implementation defines new 'split' load and store micro-ops
which allow a single memory operation to use a pair of registers as
the source or destination, then uses a single ldsplit/stsplit RMW
pair to implement cmpxchg. This patch requires support for 128-bit
memory accesses in the ISA (added via a separate patch) to support
cmpxchg16b.
|
|
Result of running 'hg m5style --skip-all --fix-white -a'.
|
|
These are packed single-precision approximate reciprocal operations,
vector and scalar versions, respectively.
This code was basically developed by copying the code for
sqrtps and sqrtss. The mrcp micro-op was simplified relative to
msqrt since there are no double-precision versions of this operation.
|
|
fild loads an integer value into the x87 top of stack register.
fucomi/fucomip compare two x87 register values (the latter
also doing a stack pop).
These instructions are used by some versions of GNU libstdc++.
|
|
Added explicit data sizes and an opcode type for correct execution.
|
|
All x87 misc registers are implemented in an array of 64 bit values
but in real hardware the size of some of these registers is smaller.
Previsouly all 64 bits where incorrectly set and then later read. To
ensure correctness we mask the value in setMiscRegNoEffect to write
only the valid bits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
|
|
This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for
saving/restore the FP registers is in the file but it was not used.
The FXSAVE and FXRSTOR instructions are used in the kernel for saving and
loading the state of the mmx,xmm and fpu registers.
This operation is triggered in FS by issuing a Device Not Available Fault. The
cr0 register has a TS flag that is set upon each context change. Every time a
task access any FP related register (SIMD as well) if the TS flag is set to
one, the device not available fault is issued. The kernel saves the current
state of the registers, and restore the previous state of the currently running
task.
Right now Gem5 lacks of this capability. the Device Not Available Fault is
never issued, leading to several problems when different threads share the same
CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this
behavior.
In order to test this a hack in the atomic cpu code was done to detect if a
static instruction has any FP operands and the cr0 reg TS bit is set. This
check must be done in the ISA dependent code. But it seems to be tricky to
access the cr0 register while executing an instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture.
Tested with a simple program in assembly language which executes the
instruction. Checked that different versions of the instruction are executed
by using the execution tracing option.
Committed by: Nilay Vaish <nilay@cs.wisc.edu
|
|
The data size used for actually writing the base value for the segment was the
default size, but really it should set the entire value without any possible
truncation.
|
|
The far pointer should be shifted right to get the selector value, not left.
Also, when calculating the width of the offset, the wrong register was used in
one spot.
|
|
This patch takes quite a large step in transitioning from the ad-hoc
RefCountingPtr to the c++11 shared_ptr by adopting its use for all
Faults. There are no changes in behaviour, and the code modifications
are mostly just replacing "new" with "make_shared".
|
|
|
|
|
|
|
|
This is an implementation of the x86 int3 and int immediate
instructions for long mode according to 'AMD64 Programmers Manual
Volume 3'.
|
|
|
|
|
|
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.
80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
|
|
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
|
|
This changeset actually fixes two issues:
* The lfpimm instruction didn't work correctly when applied to a
floating point constant (it did work for integers containing the
bit string representation of a constant) since it used
reinterpret_cast to convert a double to a uint64_t. This caused a
compilation error, at least, in gcc 4.6.3.
* The instructions loading floating point constants in the x87
processor didn't work correctly since they just stored a truncated
integer instead of a double in the floating point register. This
changeset fixes the old microcode by using lfpimm instruction
instead of the limm instructions.
|
|
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
|
|
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
|
|
Currently call and return instructions are marked as IsCall and IsReturn. Thus, the
branch predictor does not use RAS for these instructions. Similarly, the number of
function calls that took place is recorded as 0. This patch marks these instructions
as they should be.
|
|
The 'lret' instruction reloads instruction pointer and code segment from the
stack and then pops them. But the popping part is missing from the current
implementation. This caused incorrect behavior in some code related to the
Fiasco OS. Microops are being added to rectify the behavior of the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
|
|
|
|
|
|
|
|
This patch implements the fnstsw instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
|
|
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
|
|
Shuffle the 32 bit values into position, and then add in parallel.
|
|
The disp displacement was left off the load microop so the wrong value was
used.
|
|
|
|
This patch adds a new microop for memory barrier. The microop itself does
nothing, but since it is marked as a memory barrier, the O3 CPU should flush
all the pending loads and stores before the fence to the memory system.
|
|
|
|
|
|
During SYSCALL_64, use dataSize=8 when handling new rip (ref
http://www.intel.com/Assets/PDF/manual/253668.pdf 5.8.8 IA32_LSTAR is a 64-bit
address)
|
|
JMP_FAR_I was unpacking its far pointer operand using sll instead of srl like
it should, and also putting the components in the wrong registers for use by
other microcode.
|
|
During iret access LDT/GDT at CPL0 rather than after transition to user mode
(if I'm reading the Intel IA-64 architecture spec correctly, the contents of
the descriptor table are read before the CPL is updated).
|