Age | Commit message (Collapse) | Author |
|
This is an implementation of the x86 int3 and int immediate
instructions for long mode according to 'AMD64 Programmers Manual
Volume 3'.
|
|
|
|
|
|
The x87 FPU supports three floating point formats: 32-bit, 64-bit, and
80-bit floats. The current gem5 implementation supports 32-bit and
64-bit floats, but only works correctly for 64-bit floats. This
changeset fixes the 32-bit float handling by correctly loading and
rounding (using truncation) 32-bit floats instead of simply truncating
the bit pattern.
80-bit floats are loaded by first loading the 80-bits of the float to
two temporary integer registers. A micro-op (cvtint_fp80) then
converts the contents of the two integer registers to the internal FP
representation (double). Similarly, when storing an 80-bit float,
there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that
convert an internal FP register to 80-bit and stores the upper 64-bits
or lower 32-bits to an integer register, which is the written to
memory using normal integer stores.
|
|
X87 store instructions typically loads and pops the top value of the
stack and stores it in memory. The current implementation pops the
stack at the same time as the floating point value is loaded to a
temporary register. This will corrupt the state of the x87 stack if
the store fails. This changeset introduces a pop87 micro-instruction
that pops the stack and uses this instruction in the affected
macro-instructions to pop the stack after storing the value to memory.
|
|
This changeset actually fixes two issues:
* The lfpimm instruction didn't work correctly when applied to a
floating point constant (it did work for integers containing the
bit string representation of a constant) since it used
reinterpret_cast to convert a double to a uint64_t. This caused a
compilation error, at least, in gcc 4.6.3.
* The instructions loading floating point constants in the x87
processor didn't work correctly since they just stored a truncated
integer instead of a double in the floating point register. This
changeset fixes the old microcode by using lfpimm instruction
instead of the limm instructions.
|
|
The current implementation of fprem simply does an fmod and doesn't
simulate any of the iterative behavior in a real fprem. This isn't
normally a problem, however, it can lead to problems when switching
between CPU models. If switching from a real CPU in the middle of an
fprem loop to a simulated CPU, the output of the fprem loop becomes
correupted. This changeset changes the fprem implementation to work
like the one on real hardware.
|
|
This changeset fixes two problems in the FABS and FCHS
implementation. First, the ISA parser expects the assignment in
flag_code to be a pure assignment and not an and-assignment, which
leads to the isa_parser omitting the misc reg update. Second, the FCHS
and FABS macro-ops don't set the SetStatus flag, which means that the
default micro-op version, which doesn't update FSW, is executed.
|
|
Currently call and return instructions are marked as IsCall and IsReturn. Thus, the
branch predictor does not use RAS for these instructions. Similarly, the number of
function calls that took place is recorded as 0. This patch marks these instructions
as they should be.
|
|
The 'lret' instruction reloads instruction pointer and code segment from the
stack and then pops them. But the popping part is missing from the current
implementation. This caused incorrect behavior in some code related to the
Fiasco OS. Microops are being added to rectify the behavior of the instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
|
|
|
|
|
|
|
|
This patch implements the fnstsw instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
|
|
This patch implements the fsincos instruction. The code was originally written
by Vince Weaver. Gabe had made some comments about the code, but those were
never addressed. This patch addresses those comments.
|
|
Shuffle the 32 bit values into position, and then add in parallel.
|
|
The disp displacement was left off the load microop so the wrong value was
used.
|
|
|
|
This patch adds a new microop for memory barrier. The microop itself does
nothing, but since it is marked as a memory barrier, the O3 CPU should flush
all the pending loads and stores before the fence to the memory system.
|
|
|
|
|
|
During SYSCALL_64, use dataSize=8 when handling new rip (ref
http://www.intel.com/Assets/PDF/manual/253668.pdf 5.8.8 IA32_LSTAR is a 64-bit
address)
|
|
JMP_FAR_I was unpacking its far pointer operand using sll instead of srl like
it should, and also putting the components in the wrong registers for use by
other microcode.
|
|
During iret access LDT/GDT at CPL0 rather than after transition to user mode
(if I'm reading the Intel IA-64 architecture spec correctly, the contents of
the descriptor table are read before the CPL is updated).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Unfortunately my implementation of the movd instruction had two bugs.
In one case, when moving a 32-bit value into an xmm register, the
lower half of the xmm register was not zero extended.
The other case is that xmm was used instead of xmmlm as the source
for a register move. My test case didn't notice this at first
as it moved xmm0 to eax, which both have the same register
number.
|
|
This problem is like the one fixed with movhpd a few weeks ago.
A +8 displacement is used to access memory when there should
be none.
This fix is needed for the perlbmk spec2k benchmark to run.
|
|
These are complicated instructions and the micro-code might be suboptimal.
This has been tested with some small sample programs (attached)
The psrldq instruction is needed by various spec2k programs.
|
|
This patch implements the movd_Vo_Edp series of instructions.
It addresses various concerns by Gabe Black about which file the
instruction belonged in, as well as supporting REX prefixed
instructions properly.
This instruction is needed for some of the spec2k benchmarks, most
notably bzip2.
|
|
This patch implements the haddpd instruction.
It fixes the problem in the previous version (pointed out by Gabe Black)
where an incorrect result would happen if you issue the instruction
with the same argument twice, i.e. "haddpd %xmm0,%xmm0"
This instruction is used by many spec2k benchmarks.
|
|
|
|
The movhpd instruction was writing to the wrong memory offset.
|
|
The movdqa instruction should enforce 16-byte alignment.
This implementation does not do that.
These instructions are needed for most of x86_64 spec2k to run.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
result.
|
|
|