Age | Commit message (Collapse) | Author |
|
Multi gem5 is an extension to gem5 to enable parallel simulation of a
distributed system (e.g. simulation of a pool of machines
connected by Ethernet links). A multi gem5 run consists of seperate gem5
processes running in parallel (potentially on different hosts/slots on
a cluster). Each gem5 process executes the simulation of a component of the
simulated distributed system (e.g. a multi-core board with an Ethernet NIC).
The patch implements the "distributed" Ethernet link device
(dev/src/multi_etherlink.[hh.cc]). This device will send/receive
(simulated) Ethernet packets to/from peer gem5 processes. The interface
to talk to the peer gem5 processes is defined in dev/src/multi_iface.hh and
in tcp_iface.hh.
There is also a central message server process (util/multi/tcp_server.[hh,cc])
which acts like an Ethernet switch and transfers messages among the gem5 peers.
A multi gem5 simulations can be kicked off by the util/multi/gem5-multi.sh
wrapper script.
Checkpoints are supported by multi-gem5. The checkpoint must be
initiated by a single gem5 process. E.g., the gem5 process with rank 0
can take a checkpoint from the bootscript just before it invokes
'mpirun' to launch an MPI test. The message server process will notify
all the other peer gem5 processes and make them take a checkpoint, too
(after completing a global synchronisation to ensure that there are no
inflight messages among gem5).
|
|
Add a simple device shim that interfaces with the NoMali model
library. The gem5 side of the interface supports Mali T60x/T62x/T760
GPUs. This device model pretends to be a Mali GPU, but doesn't render
anything and executes in zero time.
|
|
The drain() call currently passes around a DrainManager pointer, which
is now completely pointless since there is only ever one global
DrainManager in the system. It also contains vestiges from the time
when SimObjects had to keep track of their child objects that needed
draining.
This changeset moves all of the DrainState handling to the Drainable
base class and changes the drain() and drainResume() calls to reflect
this. Particularly, the drain() call has been updated to take no
parameters (the DrainManager argument isn't needed) and return a
DrainState instead of an unsigned integer (there is no point returning
anything other than 0 or 1 any more). Drainable objects should return
either DrainState::Draining (equivalent to returning 1 in the old
system) if they need more time to drain or DrainState::Drained
(equivalent to returning 0 in the old system) if they are already in a
consistent state. Returning DrainState::Running is considered an
error.
Drain done signalling is now done through the signalDrainDone() method
in the Drainable class instead of using the DrainManager directly. The
new call checks if the state of the object is DrainState::Draining
before notifying the drain manager. This means that it is safe to call
signalDrainDone() without first checking if the simulator has
requested draining. The intention here is to reduce the code needed to
implement draining in simple objects.
|
|
Draining is currently done by traversing the SimObject graph and
calling drain()/drainResume() on the SimObjects. This is not ideal
when non-SimObjects (e.g., ports) need draining since this means that
SimObjects owning those objects need to be aware of this.
This changeset moves the responsibility for finding objects that need
draining from SimObjects and the Python-side of the simulator to the
DrainManager. The DrainManager now maintains a set of all objects that
need draining. To reduce the overhead in classes owning non-SimObjects
that need draining, objects inheriting from Drainable now
automatically register with the DrainManager. If such an object is
destroyed, it is automatically unregistered. This means that drain()
and drainResume() should never be called directly on a Drainable
object.
While implementing the new functionality, the DrainManager has now
been made thread safe. In practice, this means that it takes a lock
whenever it manipulates the set of Drainable objects since SimObjects
in different threads may create Drainable objects
dynamically. Similarly, the drain counter is now an atomic_uint, which
ensures that it is manipulated correctly when objects signal that they
are done draining.
A nice side effect of these changes is that it makes the drain state
changes stricter, which the simulation scripts can exploit to avoid
redundant drains.
|
|
The drain state enum is currently a part of the Drainable
interface. The same state machine will be used by the DrainManager to
identify the global state of the simulator. Make the drain state a
global typed enum to better cater for this usage scenario.
|
|
Events expected to be unserialized using an event-specific
unserializeEvent call. This call was never actually used, which meant
the events relying on it never got unserialized (or scheduled after
unserialization).
Instead of relying on a custom call, we now use the normal
serialization code again. In order to schedule the event correctly,
the parrent object is expected to use the
EventQueue::checkpointReschedule() call. This happens automatically
for events that are serialized using the AutoSerialize mechanism.
|
|
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.
* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).
* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.
* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
|
|
Make it possible to specify the size of the PIO space for an AMBA DMA
device. Maintain backwards compatibility and default to zero.
|
|
There are cases when we don't want to use a system register mapped
generic timer, but can't use the SP804. For example, when using KVM on
aarch64, we want to intercept accesses to the generic timer, but can't
do so if it is using the system register interface. In such cases,
we need to use a memory-mapped generic timer.
This changeset adds a device model that implements the memory mapped
generic timer interface. The current implementation only supports a
single frame (i.e., one virtual timer and one physical timer).
|
|
The generic timer model currently does not support virtual
counters. Virtual and physical counters both tick with the same
frequency. However, virtual timers allow a hypervisor to set an offset
that is subtracted from the counter when it is read. This enables the
hypervisor to present a time base that ticks with virtual time in the
VM (i.e., doesn't tick when the VM isn't running). Modern Linux
kernels generally assume that virtual counters exist and try to use
them by default.
|
|
This changeset cleans up the generic timer a bit and moves most of the
register juggling from the ISA code into a separate class in the same
source file as the rest of the generic timer. It also removes the
assumption that there is always 8 or fewer CPUs in the system. Instead
of having a fixed limit, we now instantiate per-core timers as they
are requested. This is all in preparation for other patches that add
support for virtual timers and a memory mapped interface.
|
|
Some versions of the kernel incorrectly swap the red and blue color
select registers. This changeset adds a workaround for that by
swapping them when instantiating a PixelConverter.
|
|
Currently, frame buffer handling in gem5 is quite ad hoc. In practice,
we pass around naked pointers to raw pixel data and expect consumers
to convert frame buffers using the (broken) VideoConverter.
This changeset completely redesigns the way we handle frame buffers
internally. In summary, it fixes several color conversion bugs, adds
support for more color formats (e.g., big endian), and makes the code
base easier to follow.
In the new world, gem5 always represents pixel data using the Pixel
struct when pixels need to be passed between different classes (e.g.,
a display controller and the VNC server). Producers of entire frames
(e.g., display controllers) should use the FrameBuffer class to
represent a frame.
Frame producers are expected to create one instance of the FrameBuffer
class in their constructors and register it with its consumers
once. Consumers are expected to check the dimensions of the frame
buffer when they consume it.
Conversion between the external representation and the internal
representation is supported for all common "true color" RGB formats of
up to 32-bit color depth. The external pixel representation is
expected to be between 1 and 4 bytes in either big endian or little
endian. Color channels are assumed to be contiguous ranges of bits
within each pixel word. The external pixel value is scaled to an 8-bit
internal representation using a floating multiplication to map it to
the entire 8-bit range.
|
|
This patch takes a last step in fixing issues related to uncacheable
accesses. We do not separate uncacheable memory from uncacheable
devices, and in cases where it is really memory, there are valid
scenarios where we need to snoop since we do not support cache
maintenance instructions (yet). On snooping an uncacheable access we
thus provide data if possible. In essence this makes uncacheable
accesses IO coherent.
The snoop filter is also queried to steer the snoops, but not updated
since the uncacheable accesses do not allocate a block.
|
|
This adds support for FreeBSD/aarch64 FS and SE mode (basic set of syscalls only)
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch introduces a UFS host controller and a UFS device. More
information about the UFS standard can be found at the JEDEC site:
http://www.jedec.org/standards-documents/results/jesd220
Note that the model does not implement the complete standard, and as
such is not an actual implementation of UFS. The following SCSI
commands are implemented: inquiry, read, read capacity, report LUNs,
start/stop, test unit ready, verify, write, format unit, send
diagnostic, synchronize cache, mode select, mode sense, request sense,
unmap, write buffer and read buffer. This is sufficient for usage with
Linux and Android.
To interact with this model a kernel version 3.9 or above is
needed.
|
|
This adds a NAND flash timing model. This model takes the number of
planes into account and is ultimately intended to be used as a
high-level performance model for any device using flash. To access the
memory, use either readMemory or writeMemory.
To make use of the model you will need an interface model
such as UFSHostDevice, which is part of a separate patch.
At the moment the flash device is part of the ARM device tree since
the only use if the UFSHostDevice, and that in turn relies on the ARM
GIC.
|
|
This patch adds an I2C bus and base device. I2C is used to connect a
variety of sensors, and this patch serves as a starting point to
enable a range of I2C devices.
|
|
This patch fixes a few small issues to ensure gem5 compiles when using
gcc 5.1.
First, the GDB_REG_BYTES in the RemoteGDB header are, rather
surprisingly, flagged as unused for both ARM and X86. Removing them,
however, causes compilation errors as they are actually used in the
source file. Moving the constant into the class definition fixes the
issue. Possibly a gcc bug.
Second, we have an unused EthPktData constructor using auto_ptr, and
the latter is deprecated. Since the code is never used it is simply
removed.
|
|
This patch adds an example configuration in ext/sst/tests/ that allows
an SST/gem5 instance to simulate a 4-core AArch64 system with SST's
memHierarchy components providing all the caches and memories.
|
|
Restoring from a checkpoint fails if either the RTC or the RTC Timer
Interrrupt event is disabled. The restored machine tried incorrectly
to schedule the next event with negative offset.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Add 32-bit access width for PrimaryTiming register and 16bit for UDMAControl
register as FreeBSD required.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch adds a new PIO-accessible GICv2m shim. This shim has a PIO
slave port on one side, and SPI 'wires' on the other. It accepts MSIs
from the system and triggers SPIs on the GIC. It is configurable with
a number of frames, each of which has a number of SPIs and a base SPI
offset.
A Linux driver for GICv2m is available upstream.
|
|
This patch removes the code that added this magic register. A
follow-up patch provides a GICv2m MSI shim that gives the same
functionality in a standard ARM system architecture way.
|
|
The ARM PL011 UART model didn't clear and raise interrupts
correctly. This changeset rewrites the whole interrupt handling and
makes it both simpler and fixes several cases where the correct
interrupts weren't raised or cleared. Additionally, it cleans up many
other aspects of the code.
|
|
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
|
|
This patch fixes a rather unfortunate oversight where the annotation
pointer was used even though it is null. Somehow the code still works,
but UBSan is rather unhappy. The use is now guarded, and the variable
is initialised in the constructor (as well as init()).
|
|
Move the (common) GIC initialization code that notifies the platform
code of the new GIC to the base class (BaseGic) instead of the Pl390
implementation.
|
|
This patch clarifies the packet timings annotated
when going through a crossbar.
The old 'firstWordDelay' is replaced by 'headerDelay' that represents
the delay associated to the delivery of the header of the packet.
The old 'lastWordDelay' is replaced by 'payloadDelay' that represents
the delay needed to processing the payload of the packet.
For now the uses and values remain identical. However, going forward
the payloadDelay will be additive, and not include the
headerDelay. Follow-on patches will make the headerDelay capture the
pipeline latency incurred in the crossbar, whereas the payloadDelay
will capture the additional serialisation delay.
|
|
The Platform base class contains a pointer to an instance of the
System which is never initialized. This can lead to subtle bugs since
some architecture-specific platform implementations contain their own
system pointer which is normally used. However, if the platform is
accessed through a pointer to its base class, the dangling pointer
will be used instead.
|
|
Correctly clear the PCI interrupt belonging to a VirtIO device when
the ISR register is read.
|
|
This change includes edits to Intel8254Timer to prevent counter events firing
before startup to comply with SimObject initialization call sequence.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This change includes edits to MC146818 timer to prevent RTC events
firing before startup to comply with SimObject initialization call sequence.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch adds table walker stats for:
- Walk events
- Instruction vs Data
- Page size histogram
- Wait time and service time histograms
- Pending requests histogram (per cycle) - measures dist. of L
(p(1..) = how often busy, p(0) = how often idle)
- Squashes, before starting and after completion
|
|
Add an assert in the PioPort that checks if a response packet from a
device has the right flags set before passing it to them rest of the
memory system.
|
|
The VirtIO devices didn't correctly set the response flags in memory
packets. This changeset adds the required Packet::makeResponse()
calls.
|
|
This command is supposed to set up a timer which will put the drive into a
standby mode if it isn't sent a command within a given time out. Since most of
the timeouts are generally significantly longer than a simulation would run
anyway, and we don't have an implementation for standby mode to begin with,
we can accept the command, do nothing, and report success.
|
|
This is used primarily for VNC.
|
|
This patch cleans up the packet memory allocation confusion. The data
is always allocated at the requesting side, when a packet is created
(or copied), and there is never a need for any device to allocate any
space if it is merely responding to a paket. This behaviour is in line
with how SystemC and TLM works as well, thus increasing
interoperability, and matching established conventions.
The redundant calls to Packet::allocate are removed, and the checks in
the function are tightened up to make sure data is only ever allocated
once. There are still some oddities in the packet copy constructor
where we copy the data pointer if it is static (without ownership),
and allocate new space if the data is dynamic (with ownership). The
latter is being worked on further in a follow-on patch.
|
|
Mostly addressing uninitialised members.
|
|
There was already a stub device at 0x80, the port traditionally used for an IO
delay. 0x80 is also the port used for POST codes sent by firmware, and that
may have prompted adding this port as a second option.
|
|
|
|
Another churn to clean up undefined behaviour, mostly ARM, but some
parts also touching the generic part of the code base.
Most of the fixes are simply ensuring that proper intialisation. One
of the more subtle changes is the return type of the sign-extension,
which is changed to uint64_t. This is to avoid shifting negative
values (undefined behaviour) in the ISA code.
|
|
This patch reverts changeset 9277177eccff which does not do what it
was intended to do. In essence, we go back to implementing mkutctime
much like the non-standard timegm extension.
|
|
This patch changes how we turn time into UTC. Previously we
manipulated the TZ environment variable, but this has issues as the
strings that are manipulated could be tainted (see e.g. CERT
ENV34-C). Now we simply rely on the built-in gmtime function and avoid
touching getenv/setenv all together.
|
|
Sysfs on ubuntu scrapes the entire PCI config space
when it discovers a device using 4 byte accesses.
This was not supported by our devices, in particular the NIC
that implemented the extended PCI config space. This change
allows the extended PCI config space to be accessed by
sysfs properly.
|
|
This patch transitions the EthPacketData from the ad-hoc
RefCountingPtr to the c++11 shared_ptr. There are no changes in
behaviour, and the code modifications are mainly replacing "new" with
"make_shared".
The bool casting operator for the shared_ptr is explicit, and we must
therefore either cast it, compare it to NULL (p != nullptr), double
negate it (!!p) or do a (p ? true : false).
|
|
Some incorrect casting to IntRegIndex, and a few uninitialized members
in the i8254xGBe device.
|
|
|
|
This patch changes the name of the Bus classes to XBar to better
reflect the actual timing behaviour. The actual instances in the
config scripts are not renamed, and remain as e.g. iobus or membus.
As part of this renaming, the code has also been clean up slightly,
making use of range-based for loops and tidying up some comments. The
only changes outside the bus/crossbar code is due to the delay
variables in the packet.
--HG--
rename : src/mem/Bus.py => src/mem/XBar.py
rename : src/mem/coherent_bus.cc => src/mem/coherent_xbar.cc
rename : src/mem/coherent_bus.hh => src/mem/coherent_xbar.hh
rename : src/mem/noncoherent_bus.cc => src/mem/noncoherent_xbar.cc
rename : src/mem/noncoherent_bus.hh => src/mem/noncoherent_xbar.hh
rename : src/mem/bus.cc => src/mem/xbar.cc
rename : src/mem/bus.hh => src/mem/xbar.hh
|