Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
it's possible for the offset provided to an HSAIL mem inst to be a negative
value, however the variable we use to hold the offset is an unsigned type.
this can lead to excessively large offset values when the offset is negative,
which will almost certainly cause the access to go out of bounds.
|
|
RISC-V makes use of load-reserved and store-conditional instructions to
enable creation of lock-free concurrent data manipulation as well as
ACQUIRE and RELEASE semantics for memory ordering of LR, SC, and AMO
instructions (the latter of which do not follow LR/SC semantics). This
patch is a correction to patch 4, which added these instructions to the
implementation of RISC-V. It modifies locked_mem.hh and the
implementations of lr.w, sc.w, lr.d, and sc.d to apply the proper gem5
flags and return the proper values.
An important difference between gem5's LLSC semantics and RISC-V's LR/SC
ones, beyond the name, is that gem5 uses 0 to indicate failure and 1 to
indicate success, while RISC-V is the opposite. Strictly speaking, RISC-V
uses 0 to indicate success and nonzero to indicate failure where the
value would indicate the error, but currently only 1 is reserved as a
failure code by the ISA reference.
This is the seventh patch in the series which originally consisted of five
patches that added the RISC-V ISA to gem5. The original five patches added
all of the instructions and added support for more detailed CPU models and
the sixth patch corrected the implementations of Linux constants and
structs. There will be an eighth patch that adds some regression tests
for the instructions.
[Removed some commented-out code from locked_mem.hh.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
This is an add-on patch for the original series that implemented RISC-V
that improves the implementation of Linux emulation for SE mode. Basically
it cleans up linux/linux.hh by removing constants that haven't been
defined for the RISC-V Linux proxy kernel and rearranging the stat
struct so it aligns with RISC-V's implementation of it. It also adds
placeholders for system calls that have been given numbers in RISC-V
but haven't been given implementations yet. These system calls are
as follows:
- readlinkat
- sigprocmask
- ioctl
- clock_gettime
- getrusage
- getrlimit
- setrlimit
The first five patches implemented RISC-V with the base ISA and multiply,
floating point, and atomic extensions and added support for detailed
CPU models with memory timing.
[Fixed incompatibility with changes made from patch 1.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
Last of five patches adding RISC-V to GEM5. This patch adds support for
timing, minor, and detailed CPU models that was missing in the last four,
which basically consists of handling timing-mode memory accesses and
telling the minor and detailed models what a no-op instruction should
be (addi zero, zero, 0).
Patches 1-4 introduced RISC-V and implemented the base instruction set,
RV64I, and added the multiply, floating point, and atomic memory
extensions, RV64MAFD.
[Fixed compatibility with edit from patch 1.]
[Fixed compatibility with hg copy edit from patch 1.]
[Fixed some style errors in locked_mem.hh.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
Fourth of five patches adding RISC-V to GEM5. This patch adds the RV64A
extension, which includes atomic memory instructions. These instructions
atomically read a value from memory, modify it with a value contained in a
source register, and store the original memory value in the destination
register and modified value back into memory. Because this requires two
memory accesses and GEM5 does not support two timing memory accesses in
a single instruction, each of these instructions is split into two micro-
ops: A "load" micro-op, which reads the memory, and a "store" micro-op,
which modifies and writes it back. Each atomic memory instruction also has
two bits that acquire and release a lock on its memory location.
Additionally, there are atomic load and store instructions that only either
load or store, but not both, and can acquire or release memory locks.
Note that because the current implementation of RISC-V only supports one
core and one thread, it doesn't make sense to make use of AMO instructions.
However, they do form a standard extension of the RISC-V ISA, so they are
included mostly as a placeholder for when multithreaded execution is
implemented. As a result, any tests for their correctness in a future
patch may be abbreviated.
Patch 1 introduced RISC-V and implemented the base instruction set, RV64I;
patch 2 implemented the integer multiply extension, RV64M; and patch 3
implemented the single- and double-precision floating point extensions,
RV64FD.
Patch 5 will add support for timing, minor, and detailed CPU models that
isn't present in patches 1-4.
[Added missing file amo.isa]
[Replaced information removed from initial patch that was missed during
division into multiple patches.]
[Fixed some minor formatting issues.]
[Fixed oversight where LR and SC didn't have both AQ and RL flags.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
Third of five patches adding RISC-V to GEM5. This patch adds the RV64FD
extensions, which include single- and double-precision floating point
instructions.
Patch 1 introduced RISC-V and implemented the base instruction set, RV64I
and patch 2 implemented the integer multiply extension, RV64M.
Patch 4 will implement the atomic memory instructions, RV64A, and patch
5 will add support for timing, minor, and detailed CPU models that is
missing from the first four patches.
[Fixed exception handling in floating-point instructions to conform better
to IEEE-754 2008 standard and behavior of the Chisel-generated RISC-V
simulator.]
[Fixed style errors in decoder.isa.]
[Fixed some fuzz caused by modifying a previous patch.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
Second of five patches adding RISC-V to GEM5. This patch adds the
RV64M extension, which includes integer multiply and divide instructions.
Patch 1 introduced RISC-V and implemented the base instruction set, RV64I.
Patch 3 will implement the floating point extensions, RV64FD; patch 4 will
implement the atomic memory instructions, RV64A; and patch 5 will add
support for timing, minor, and detailed CPU models that is missing from
the first four patches.
[Added mulw instruction that was missed when dividing changes among
patches.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
First of five patches adding RISC-V to GEM5. This patch introduces the
base 64-bit ISA (RV64I) in src/arch/riscv for use with syscall emulation.
The multiply, floating point, and atomic memory instructions will be added
in additional patches, as well as support for more detailed CPU models.
The loader is also modified to be able to parse RISC-V ELF files, and a
"Hello world\!" example for RISC-V is added to test-progs.
Patch 2 will implement the multiply extension, RV64M; patch 3 will implement
the floating point (single- and double-precision) extensions, RV64FD;
patch 4 will implement the atomic memory instructions, RV64A, and patch 5
will add support for timing, minor, and detailed CPU models that is missing
from the first four patches (such as handling locked memory).
[Removed several unused parameters and imports from RiscvInterrupts.py,
RiscvISA.py, and RiscvSystem.py.]
[Fixed copyright information in RISC-V files copied from elsewhere that had
ARM licenses attached.]
[Reorganized instruction definitions in decoder.isa so that they are sorted
by opcode in preparation for the addition of ISA extensions M, A, F, D.]
[Fixed formatting of several files, removed some variables and
instructions that were missed when moving them to other patches, fixed
RISC-V Foundation copyright attribution, and fixed history of files
copied from other architectures using hg copy.]
[Fixed indentation of switch cases in isa.cc.]
[Reorganized syscall descriptions in linux/process.cc to remove large
number of repeated unimplemented system calls and added implmementations
to functions that have received them since it process.cc was first
created.]
[Fixed spacing for some copyright attributions.]
[Replaced the rest of the file copies using hg copy.]
[Fixed style check errors and corrected unaligned memory accesses.]
[Fix some minor formatting mistakes.]
Signed-off by: Alec Roelke
Signed-off by: Jason Lowe-Power <jason@lowepower.com>
|
|
UBSAN flags this operation because it detects that arg is being cast directly
to an unsigned type, argBits. this patch fixes this by first casting the
value to a signed int type, then reintrepreting the raw bits of the signed
int into argBits.
|
|
|
|
No one appears to be using it, and it is causing build issues
and increases the development and maintenance effort.
|
|
fixes to appease clang++. tested on:
Ubuntu clang version 3.5.0-4ubuntu2~trusty2
(tags/RELEASE_350/final) (based on LLVM 3.5.0)
Ubuntu clang version 3.6.0-2ubuntu1~trusty1
(tags/RELEASE_360/final) (based on LLVM 3.6.0)
the fixes address the following five issues:
1) the exec continuations in gpu_static_inst.hh were marked
as protected when they should be public. here we mark
them as public
2) the Abs instruction uses std::abs() in its execute method.
because Abs is templated, it can also operate on U32 and U64,
types, which cause Abs::execute() to pass uint32_t and uint64_t
types to std::abs() respectively. this triggers a warning
because std::abs() has no effect in this case. to rememdy this
we add template specialization for the execute() method of Abs
when its template paramter is U32 or U64.
3) Some potocols that utilize the code in cprintf.hh were missing
includes to BoolVec.hh, which defines operator<< for the BoolVec
type. This would cause issues when the generated code would try
to pass a BoolVec type to a method in cprintf.hh that used
operator<< on an instance of a BoolVec.
4) Surprise, clang doesn't like it when you clobber all the bits
in a newly allocated object. I.e., this code:
tlb = new GpuTlbEntry\[size\];
std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size);
Let's use std::vector to track the TLB entries in the GpuTlb now...
5) There were a few variables used only in DPRINTFs, so we mark them
with M5_VAR_USED.
|
|
This patch adds the ability for an application to request dist-gem5 to begin/
end synchronization using an m5 op. When toggling on sync, all nodes agree
on the next sync point based on the maximum of all nodes' ticks. CPUs are
suspended until the sync point to avoid sending network messages until sync has
been enabled. Toggling off sync acts like a global execution barrier, where
all CPUs are disabled until every node reaches the toggle off point. This
avoids tricky situations such as one node hitting a toggle off followed by a
toggle on before the other nodes hit the first toggle off.
|
|
this patch adds an ordered response buffer to the GM pipeline
to ensure in-order data delivery. the buffer is implemented as
a stl ordered map, which sorts the request in program order by
using their sequence ID. when requests return to the GM pipeline
they are marked as done. only the oldest request may be serviced
from the ordered buffer, and only if is marked as done.
the FIFO response buffers are kept and used in OoO delivery mode
|
|
for HSAIL an operand's indices into the register files may be calculated
trivially, because the operands are always read from a register file, or are
an immediate.
for machine ISA, however, an op selector may specify special registers, or
may specify special SGPRs with an alias op selector value. the location of
some of the special registers values are dependent on the size of the RF
in some cases. here we add a way for the underlying getRegisterIndex()
method to know about the size of the RFs, so that it may find the relative
positions of the special register values.
|
|
currently the PC is incremented on an instruction granularity, and not as an
instruction's byte address. machine ISA instructions assume the PC is a byte
address, and is incremented accordingly. here we make the GPU model, and the
HSAIL instructions treat the PC as a byte address as well.
|
|
the GPUISA class is meant to encapsulate any ISA-specific behavior - special
register accesses, isa-specific WF/kernel state, etc. - in a generic enough
way so that it may be used in ISA-agnostic code.
gpu-compute: use the GPUISA object to advance the PC
the GPU model treats the PC as a pointer to individual instruction objects -
which are store in a contiguous array - and not a byte address to be fetched
from the real memory system. this is ok for HSAIL because all instructions
are considered by the model to be the same size.
in machine ISA, however, instructions may be 32b or 64b, and branches are
calculated by advancing the PC by the number of words (4 byte chunks) it
needs to advance in the real instruction stream. because of this there is
a mismatch between the PC we use to index into the instruction array, and
the actual byte address PC the ISA expects. here we move the PC advance
calculation to the ISA so that differences in the instrucion sizes may be
accounted for in generic way.
|
|
because every taken branch causes fetch to be discarded, we move the call
to the WF to avoid to have to call it from each and every branch instruction
type.
|
|
we are removing doGmReturn from the GM pipe, and adding completeAcc()
implementations for the HSAIL mem ops. the behavior in doGmReturn is
dependent on HSAIL and HSAIL mem ops, however the completion phase
of memory ops in machine ISA can be very different, even amongst individual
machine ISA mem ops. so we remove this functionality from the pipeline and
allow it to be implemented by the individual instructions.
|
|
this patch removes the GPUStaticInst enums that were defined in GPU.py.
instead, a simple set of attribute flags that can be set in the base
instruction class are used. this will help unify the attributes of HSAIL
and machine ISA instructions within the model itself.
because the static instrution now carries the attributes, a GPUDynInst
must carry a pointer to a valid GPUStaticInst so a new static kernel launch
instruction is added, which carries the attributes needed to perform a
the kernel launch.
|
|
|
|
|
|
Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to
Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which
distinguishes writes to the INT and FP register banks.
Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72,
where the "latency" of FMADD is 3 if the next instruction is a FMADD and
has only the augend to destination dependency, otherwise it's 7 cycles.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
This commit adds missing non-predicated, scalar floating point
instructions. Specifically VRINT* floating point integer rounding
instructions and VSEL* floating point conditional selects.
Change-Id: I23cbd1389f151389ac8beb28a7d18d5f93d000e7
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>
|
|
Adding details, e.g. rip, rsp etc. to the kvm pagefault exit when in SE mode.
|
|
|
|
|
|
Renaming members of the Wavefront class in accordance with the style guide.
|
|
Change-Id: Id2acbc09772be310a0eb9e33295afab07e08a4fa
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Normal MMAPPED_IPR requests are allowed to execute speculatively under the
assumption that they have no side effects. The special case of m5ops that are
treated like MMAPPED_IPR should not be allowed to execute speculatively, since
they can have side-effects. Adding the STRICT_ORDER flag to these requests
blocks execution until the associated instruction hits the ROB head.
|
|
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
this patch fixes issues with changeset 11593
use the host's pwrite() syscall for pwrite64Func(),
as opposed to pwrite64(), because pwrite64() does
not work well on all distros.
undo the enabling of fstatfs, as we will add this
in a separate pate.
|
|
this patch adds an implementation for the pwrite64 syscall and
enables it for x86_64, and enables fstatfs for x86_64.
|
|
Introduce and use a lookup table.
Using fetchDescriptor() rather than DMA cleanly handles nested paging.
Change-Id: I69ec762f176bd752ba1040890e731826b58d15a6
|
|
During host bootup, KVM reads/writes to CNTHCTL_EL2. Because this
miscreg has not been implemented, the simulation would end there. This
patch causes the simulation to warn about the read/write instead of fail.
Change-Id: If034bfd0818a9a5e50c5fe86609e945258c96fa3
|
|
This fixes a bug where stage 2 lookups used the AArch32
permissions rules even if we were executing in AArch64 mode.
Change-Id: Ia40758f0599667ca7ca15268bd3bf051342c24c1
|
|
This patch corrects IPA reporting if the translation faults in a
stage 2 lookup.
Change-Id: I0b914527f8a9f98a5e980a131cf9d03e5584b4e9
|
|
This patch adds support for stage 2 TLBI instructions
such as TLBI IPAS2E1_Xt.
Change-Id: I0cd5e8055b0c1003e03439aa5183252f50ea0a88
|
|
Change-Id: I14c93a5460550051a12129e792a9a9bd522a145c
|
|
This patch restricts trapping to hypervisor only if we are in the
correct exception level for the trap to happen.
Change-Id: I0a382b6a572ef835ea36d2702b8a81b633bd3df0
|
|
Faults that could potentially be routed to the hypervisor checked
whether or not they were in a secure state without checking if security
was enabled or not. This caused faults not to be routed correctly. This
patch causes secure state checking to first ask if security is enabled.
Change-Id: I179e9b181b27f552734c9bab2b18d05ac579a119
|
|
We recompute if we are doing a stage 2 walk inside of the table walker
but we have already figured it out in the tlb. Pass the information in
to the walk instead of recomputing it.
Change-Id: I39637ce99309b2ddbc30344d45ac9ebf6a203401
|
|
The functional case is already handled within the fetchDescriptor()
function. We can thus use that function for both atomic and functional
mode when we start the table walk.
Change-Id: Iacaed28cd9024d259fd37a58150efd00ff94d86e
|