Age | Commit message (Collapse) | Author |
|
|
|
This changeset adds support for aarch64 in kvm. The CPU module
supports both checkpointing and online CPU model switching as long as
no devices are simulated by the host kernel. It currently has the
following limitations:
* The system register based generic timer can only be simulated by
the host kernel. Workaround: Use a memory mapped timer instead to
simulate the timer in gem5.
* Simulating devices (e.g., the generic timer) in the host kernel
requires that the host kernel also simulates the GIC.
* ID registers in the host and in gem5 must match for switching
between simulated CPUs and KVM. This is particularly important
for ID registers describing memory system capabilities (e.g.,
ASID size, physical address size).
* Switching between a virtualized CPU and a simulated CPU is
currently not supported if in-kernel device emulation is
used. This could be worked around by adding support for switching
to the gem5 (e.g., the KvmGic) side of the device models. A
simpler workaround is to avoid in-kernel device models
altogether.
|
|
This changeset adds a GIC implementation that uses the kernel's
built-in support for simulating the interrupt controller. Since there
is currently no support for state transfer between gem5 and the
kernel, the device model does not support serialization and CPU
switching (which would require switching to a gem5-simulated GIC).
|
|
There are cases (particularly when attaching GDB) when instruction
events are scheduled at the current instruction tick. This used to
trigger an assertion error in kvm. This changeset adds a check for
this condition and forces KVM to do a quick entry that completes any
pending IO operations, but does not execute any new instructions,
before servicing the event. We could check if we need to enter KVM at
all, but forcing a quick entry is makes the code slightly cleaner and
does not hurt correctness (performance is hardly an issue in these
cases).
|
|
This changeset moves the ARM-specific KVM CPU implementation to
arch/arm/kvm/. This change is expected to keep the source tree
somewhat cleaner as we start adding support for ARMv8 and KVM
in-kernel interrupt controller simulation.
--HG--
rename : src/cpu/kvm/ArmKvmCPU.py => src/arch/arm/kvm/ArmKvmCPU.py
rename : src/cpu/kvm/arm_cpu.cc => src/arch/arm/kvm/arm_cpu.cc
rename : src/cpu/kvm/arm_cpu.hh => src/arch/arm/kvm/arm_cpu.hh
|
|
The MinorCPU would count bubbles in Execute::issue as part of
the num_insts_issued and so sometimes reach the instruction
issue limit incorrectly.
Fixed by checking for a bubble in one new place.
|
|
The register dumping code in kvm tries to print the bytes in large
registers (128 bits and larger) instead of printing them as hex. This
changeset fixes that.
|
|
Protect x86-specific APIs in KvmVM with compile-time guards to avoid
breaking ARM builds.
|
|
Three minor issues are resolved:
1. Apparently gcc 5.1 does not like negation of booleans followed by
bitwise AND.
2. Somehow the compiler also gets confused and warns about
NoopMachInst being unused (removing it causes compilation errors
though). Most likely a compiler bug.
3. There seems to be a number of instances where loop unrolling causes
false positives for the array-bounds check. For now, switch to
std::array. Potentially we could disable the warning for newer gcc
versions, but switching to std::array is probably a good move in
any case.
|
|
The Request::UNCACHEABLE flag currently has two different
functions. The first, and obvious, function is to prevent the memory
system from caching data in the request. The second function is to
prevent reordering and speculation in CPU models.
This changeset gives the order/speculation requirement a separate flag
(Request::STRICT_ORDER). This flag prevents CPU models from doing the
following optimizations:
* Speculation: CPU models are not allowed to issue speculative
loads.
* Write combining: CPU models and caches are not allowed to merge
writes to the same cache line.
Note: The memory system may still reorder accesses unless the
UNCACHEABLE flag is set. It is therefore expected that the
STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent
this behavior.
|
|
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.
The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
|
|
This patch fixes a recent issue with gcc 4.9 (and possibly more) being
convinced that indices outside the array bounds are used when
initialising the FUPool members.
|
|
Currently, each op class has a parameter issueLat that denotes the cycles after
which another op of the same class can be issued. As of now, this latency can
either be one cycle (fully pipelined) or same as execution latency of the op
(not at all pipelined). The fact that issueLat is a parameter of type Cycles
makes one believe that it can be set to any value. To avoid the confusion, the
parameter is being renamed as 'pipelined' with type boolean. If set to true,
the op would execute in a fully pipelined fashion. Otherwise, it would execute
in an unpipelined fashion.
|
|
This patch sets the default latency of the division microop to a single cycle
on x86. This is because the division instructions DIV and IDIV have been
implemented as loops of div microops, where each microop computes a single bit
of the quotient.
|
|
The o3 cpu instruction queue model uses the count variable to track the number
of unissued instructions in the queue. Previously, the squash method used
this variable to avoid executing the doSquash method when there were no
unissued instructions in the pipeline. A corner case problem exists when
only issued instructions exist in the pipeline and a squash occurs; the
doSquash code is not invoked and subsequently does not clean up state properly.
|
|
This patch takes the final step in removing the InOrderCPU from the
tree. Rest in peace.
The MinorCPU is now used to model an in-order microarchitecture, and
long term the MinorCPU will eventually be renamed InOrderCPU.
|
|
This patch ensures that the CPU progress Event is triggered for the new set of
switched_cpus that get scheduled (e.g. during fast-forwarding). it also avoids
printing the interval state if the cpu is currently switched out.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
The totalInstructions counter is only incremented when the whole instruction is
commited and not on every microop. It was incorrectly reset in atomic and
timing cpus.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>"
|
|
This patch fixes an issue that prevented gem5 to be built with C++
config and without Python.
|
|
Makes x86-style locked operations even more distinct from
LLSC operations. Using "locked" by itself should be
obviously ambiguous now.
|
|
Fix erroneous message format for fatal error.
Previously, code did not have type indicator (% instead of %d).
Also removed redundant fatal check.
Ran modified sweep.py with in range and out of range values to test.
|
|
Refactor the way that specific MemCmd values are generated for packets.
The new approach is a little more elegant in that we assign the right
value up front, and it's also more amenable to non-heap-allocated
Packet objects.
Also replaced the code in the Minor model that was still doing it the
ad-hoc way.
This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
|
|
|
|
|
|
|
|
|
|
|
|
The variable is used in only one place and a whole new function setNextStatus()
has been defined just to compute the value of the variable. Instead of calling
the function, the value is now computed in the loop that preceded the function
call.
|
|
|
|
This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.
Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.
|
|
This patch changes how the MMU and table walkers are created such that
a single port is used to connect the MMU and the TLBs to the memory
system. Previously two ports were needed as there are two table walker
objects (stage one and stage two), and they both had a port. Now the
port itself is moved to the Stage2MMU, and each TableWalker is simply
using the port from the parent.
By using the same port we also remove the need for having an
additional crossbar joining the two ports before the walker cache or
the L2. This simplifies the creation of the CPU cache topology in
BaseCPU.py considerably. Moreover, for naming and symmetry reasons,
the TLB walker port is connected through the stage-one table walker
thus making the naming identical to x86. Along the same line, we use
the stage-one table walker to generate the master id that is used by
all TLB-related requests.
|
|
Now, prior to the renaming, the instruction requests the exact amount of
registers it will need, and the rename_map decides whether the instruction is
allowed to proceed or not.
|
|
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
|
|
Have the traffic generator add its masterID as the PC address to the
requests. That way, prefetchers (and other components) that use a PC
for request classification will see per-tester streams of requests.
This enables us to test strided prefetchers with the memchecker, too.
|
|
To be able to use the TrafficGen in a system with caches we need to
allow it to sink incoming snoop requests. By default the master port
panics, so silently ignore any snoops.
|
|
Finally took the plunge and made this apply to all ISAs, not just ARM.
|
|
Doesn't support x86 due to static instruction representation.
--HG--
rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py
|
|
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.
The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.
In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
|
|
The TLB-related code is generally architecture dependent and should
live in the arch directory to signify that.
--HG--
rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py
rename : src/sim/tlb.cc => src/arch/generic/tlb.cc
rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
|
|
This patch sets the CPU status to idle when the last active thread gets
suspended.
|
|
This patch changes how the timing CPU deals with processing responses,
always scheduling an event, even if it is for the current tick. This
helps to avoid situations where a new request shows up before a
response is finished in the crossbar, and also is more in line with
any realistic behaviour.
|
|
While the IsFirstMicroop flag exists it was only occasionally used in the ARM
instructions that gem5 microOps and therefore couldn't be relied on to be correct.
|
|
Track memory size and flags as well as add some comments and consts.
|
|
We have no way of knowing if a CPU model is on the wrong path with
our execute-in-execute CPU models. Don't pretend that we do.
|
|
|
|
If someone wants to debug with legion again they can restore the
code from the repository, but no need to have it hang around indefinately.
|
|
This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
|
|
The ppCommit should notify the attached listener every time the cpu commits
a microop or non microcoded insturction. The listener can then decide
whether it will process only the last microop (eg. SimPoint probe).
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
|