Age | Commit message (Collapse) | Author |
|
Objects that are can be serialized are supposed to inherit from the
Serializable class. This class is meant to provide a unified API for
such objects. However, so far it has mainly been used by SimObjects
due to some fundamental design limitations. This changeset redesigns
to the serialization interface to make it more generic and hide the
underlying checkpoint storage. Specifically:
* Add a set of APIs to serialize into a subsection of the current
object. Previously, objects that needed this functionality would
use ad-hoc solutions using nameOut() and section name
generation. In the new world, an object that implements the
interface has the methods serializeSection() and
unserializeSection() that serialize into a named /subsection/ of
the current object. Calling serialize() serializes an object into
the current section.
* Move the name() method from Serializable to SimObject as it is no
longer needed for serialization. The fully qualified section name
is generated by the main serialization code on the fly as objects
serialize sub-objects.
* Add a scoped ScopedCheckpointSection helper class. Some objects
need to serialize data structures, that are not deriving from
Serializable, into subsections. Previously, this was done using
nameOut() and manual section name generation. To simplify this,
this changeset introduces a ScopedCheckpointSection() helper
class. When this class is instantiated, it adds a new /subsection/
and subsequent serialization calls during the lifetime of this
helper class happen inside this section (or a subsection in case
of nested sections).
* The serialize() call is now const which prevents accidental state
manipulation during serialization. Objects that rely on modifying
state can use the serializeOld() call instead. The default
implementation simply calls serialize(). Note: The old-style calls
need to be explicitly called using the
serializeOld()/serializeSectionOld() style APIs. These are used by
default when serializing SimObjects.
* Both the input and output checkpoints now use their own named
types. This hides underlying checkpoint implementation from
objects that need checkpointing and makes it easier to change the
underlying checkpoint storage code.
|
|
Fix erroneous message format for fatal error.
Previously, code did not have type indicator (% instead of %d).
Also removed redundant fatal check.
Ran modified sweep.py with in range and out of range values to test.
|
|
This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.
The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.
The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.
|
|
Have the traffic generator add its masterID as the PC address to the
requests. That way, prefetchers (and other components) that use a PC
for request classification will see per-tester streams of requests.
This enables us to test strided prefetchers with the memchecker, too.
|
|
To be able to use the TrafficGen in a system with caches we need to
allow it to sink incoming snoop requests. By default the master port
panics, so silently ignore any snoops.
|
|
The MemTest class really only tests false sharing, and as such there
was a lot of old cruft that could be removed. This patch cleans up the
tester, and also makes it more clear what the assumptions are. As part
of this simplification the reference functional memory is also
removed.
The regression configs using MemTest are updated to reflect the
changes, and the stats will be bumped in a separate patch. The example
config will be updated in a separate patch due to more extensive
re-work.
In a follow-on patch a new tester will be introduced that uses the
MemChecker to implement true sharing.
|
|
This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
|
|
This patch simplifies how we deal with dynamically allocated data in
the packet, always assuming that it is array allocated, and hence
should be array deallocated (delete[] as opposed to delete). The only
uses of dataDynamic was in the Ruby testers.
The ARRAY_DATA flag in the packet is removed accordingly. No
defragmentation of the flags is done at this point, leaving a gap in
the bit masks.
As the last part the patch, it renames dataDynamicArray to dataDynamic.
|
|
This patch takes a first step in tightening up how we use the data
pointer in write packets. A const getter is added for the pointer
itself (getConstPtr), and a number of member functions are also made
const accordingly. In a range of places throughout the memory system
the new member is used.
The patch also removes the unused isReadWrite function.
|
|
This patch removes the parameter that enables bypassing the null check
in the Packet::getPtr method. A number of call sites assume the value
to be non-null.
The one odd case is the RubyTester, which issues zero-sized
prefetches(!), and despite being reads they had no valid data
pointer. This is now fixed, but the size oddity remains (unless anyone
object or has any good suggestions).
Finally, in the Ruby Sequencer, appropriate checks are made for flush
packets as they have no valid data pointer.
|
|
Add some missing initialisation, and fix a handful benign resource
leaks (including some false positives).
|
|
Add new DRAM_ROTATE mode to traffic generator.
This mode will generate DRAM traffic that rotates across
banks per rank, command types, and ranks per channel
The looping order is illustrated below:
for (ranks per channel)
for (command types)
for (banks per rank)
// Generate DRAM Command Series
This patch also adds the read percentage as an input argument to the
DRAM sweep script. If the simulated read percentage is 0 or 100, the
middle for loop does not generate additional commands. This loop is
used only when the read percentage is set to 50, in which case the
middle loop will toggle between read and write commands.
Modified sweep.py script, which generates DRAM traffic.
Added input arguments and support for new DRAM_ROTATE mode.
The script now has input arguments for:
1) Read percentage
2) Number of ranks
3) Address mapping
4) Traffic generator mode (DRAM or DRAM_ROTATE)
The default values are:
100% reads, 1 rank, RoRaBaCoCh address mapping, and DRAM traffic gen mode
For the DRAM traffic mode, added multi-rank support.
|
|
This patch changes two dynamic_cast to safe_cast as we assume the
return value is not NULL (without checking).
|
|
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
|
|
This patch tidies up random number generation to ensure that it is
done consistently throughout the code base. In essence this involves a
clean-up of Ruby, and some code simplifications in the traffic
generator.
As part of this patch a bunch of skewed distributions (off-by-one etc)
have been fixed.
Note that a single global random number generator is used, and that
the object instantiation order will impact the behaviour (the sequence
of numbers will be unaffected, but if module A calles random before
module B then they would obviously see a different outcome). The
dependency on the instantiation order is true in any case due to the
execution-model of gem5, so we leave it as is. Also note that the
global ranom generator is not thread safe at this point.
Regressions using the memtest, TrafficGen or any Ruby tester are
affected and will be updated accordingly.
|
|
The namespace Message conflicts with the Message data type used extensively
in Ruby. Since Ruby is being moved to the same Master/Slave ports based
configuration style as the rest of gem5, this conflict needs to be resolved.
Hence, the namespace is being renamed to ProtoMessage.
|
|
There is another type Time in src/base class which results in a conflict.
|
|
This patch adds a check to ensure that packets which are not going to
a memory range are suppressed in the traffic generator. Thus, if a
trace is collected in full-system, the packets destined for devices
are not played back.
|
|
This patch enables a new 'DRAM' mode to the existing traffic
generator, catered to generate specific requests to DRAM based on
required hit length (stride size) and bank utilization. It is an add on
to the Random mode.
The basic idea is to control how many successive packets target the
same page, and how many banks are being used in parallel. This gives a
two-dimensional space that stresses different aspects of the DRAM
timing.
The configuration file needed to use this patch has to be changed as
follow: (reference to Random Mode, LPDDR3 memory type)
'STATE 0 10000000000 RANDOM 50 0 134217728 64 3004 5002 0'
-> 'STATE 0 10000000000 DRAM 50 0 134217728 32 3004 5002 0 96 1024 8 6 1'
The last 4 parameters to be added are:
<stride size (bytes), page size(bytes), number of banks available in DRAM,
number of banks to be utilized, address mapping scheme>
The address mapping information is used to get the stride address
stream of the specified size and to know where to find the bank
bits. The configuration file has a parameter where '0'-> RoCoRaBaCh,
'1'-> RoRaBaCoCh/RoRaBaChCo address-mapping schemes. Note that the
generator currently assumes a single channel and a single rank. This
is to avoid overwhelming the traffic generator with information about
the memory organisation.
|
|
Prevent incomplete configuration of TrafficGen class from causing
segmentation faults. If an 'INIT' line is not present in the
configuration file then the currState variable will remain
uninitialized which may result in a crash.
|
|
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch addresses an issue with trace playback in the TrafficGen
where the trace was reset but the header was not read from the trace
when a captured trace was played back for a second time. This resulted
in parsing errors as the expected message was not found in the trace
file.
The header check is moved to an init funtion which is called by the
constructor and when the trace is reset. This ensures that the trace
header is read each time when the trace is replayed.
This patch also addresses a small formatting issue in a panic.
|
|
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.
Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.
A follow-on patch updates the configuration scripts accordingly.
|
|
Add a check which ensures that the minumum period for the LINEAR and
RANDOM traffic generator states is less than or equal to the maximum
period. If the minimum period is greater than the maximum period a
fatal is triggered.
|
|
This patch fixes a bug with the traffic generator which occured when
reading in the state transitions from the configuration
file. Previously, the size of the vector which stored the transitions
was used to get the size of the transitions matrix, rather than using
the number of states. Therefore, if there were more transitions than
states, i.e. some transitions has a probability of less than 1, then
the traffic generator would fatal when trying to check the
transitions.
This issue has been addressed by using the number of input states,
rather then the number of transitions.
|
|
This patch adds an optional request elasticity to the traffic
generator, effectievly compensating for it in the case of the linear
and random generators, and adding it in the case of the trace
generator. The accounting is left with the top-level traffic
generator, and the individual generators do the necessary math as part
of determining the next packet tick.
Note that in the linear and random generators we have to compensate
for the blocked time to not be elastic, i.e. without this patch the
aforementioned generators will slow down in the case of back-pressure.
|
|
This patch changes the queued port for a conventional master port and
stalls the traffic generator when requests are not immediately
accepted. This is a first step to allowing elasticity in the injection
of requests.
The patch also adds stats for the sent packets and retries, and
slightly changes how the nextPacketTick and getNextPacket
interact. The advancing of the trace is now moved to getNextPacket and
nextPacketTick is only responsible for answering the question when the
next packet should be sent.
|
|
This patch moves the responsibility for sending packets out of the
generator states and leaves it with the top-level traffic
generator. The main aim of this patch is to enable a transition to
non-queued ports, i.e. with send/retry flow control, and to do so it
is much more convenient to not wrap the port interactions and instead
leave it all local to the traffic generator.
The generator states now only govern when they are ready to send
something new, and the generation of the packets to send. They thus
have no knowledge of the port that is used.
|
|
This patch simplifies the object hierarchy of the traffic generator by
getting rid of the StateGraph class and folding this functionality
into the traffic generator itself.
The main goal of this patch is to facilitate upcoming changes by
reducing the number of affected layers.
|
|
This patch ensures the flags are always initialised.
|
|
This patch changes the TraceGen such that it uses the optional request
flags from the protobuf trace if they are present.
|
|
This patch enables the use of the generator behaviours outside the
TrafficGen module. This is useful e.g. to allow packet replay modes
for other devices in the system without having to replace them with a
TrafficGen in the configuration files.
This change also enables more specific behaviours to be composed as
specific modules, e.g. BaseBandModem can use a number of generators
and have application-specific parameters based around a specific set
of generators.
|
|
The traffic generator used to incorrectly determine the next state in
when state 0 had a non-zero probability. Due to the way the next
transition was determined, state 0 could never be entered other than
as an initial state. This changeset updates the transitition() method
to correctly handle such cases and cases where the transition matrix
is a 1x1 matrix.
|
|
This patch fixes the warnings that clang3.2svn emit due to the "-Wall"
flag. There is one case of an uninitialised value in the ARM neon ISA
description, and then a whole range of unused private fields that are
pruned.
|
|
This patch adds a predecessor field to the SenderState base class to
make the process of linking them up more uniform, and enable a
traversal of the stack without knowing the specific type of the
subclasses.
There are a number of simplifications done as part of changing the
SenderState, particularly in the RubyTest.
|
|
Virtualized CPUs and the fastmem mode of the atomic CPU require direct
access to physical memory. We currently require caches to be disabled
when using them to prevent chaos. This is not ideal when switching
between hardware virutalized CPUs and other CPU models as it would
require a configuration change on each switch. This changeset
introduces a new version of the atomic memory mode,
'atomic_noncaching', where memory accesses are inserted into the
memory system as atomic accesses, but bypass caches.
To make memory mode tests cleaner, the following methods are added to
the System class:
* isAtomicMode() -- True if the memory mode is 'atomic' or 'direct'.
* isTimingMode() -- True if the memory mode is 'timing'.
* bypassCaches() -- True if caches should be bypassed.
The old getMemoryMode() and setMemoryMode() methods should never be
used from the C++ world anymore.
|
|
This patch further removes calls to g_system_ptr->getTime() where ever other
clocked objects are available for providing current time.
|
|
This patch moves the packet creating and sending to a member function
in the shared base class to avoid code duplication.
|
|
This patch adds support for reading input traces encoded using
protobuf according to what is done in the CommMonitor.
A follow-up patch adds a Python script that can be used to convert the
previously used ASCII traces to protobuf equivalents. The appropriate
regression input is updated as part of this patch.
|
|
This patch encapsulates the traffic generator input in a stream class
such that the parsing is not visible to the trace generator. The
change takes us one step closer to using protobuf-based input traces
for the trace replay.
The functionality of the current input stream is identical to what it
was, and the ASCII format remains the same for now.
|
|
This patch fixes the computation that determines whether to perform a
read or a write such that the two corner cases (0 and 100) are both
more efficient and handled correctly.
|
|
The directed tester supports only generating only read or only write accesses. The
patch modifies the tester to support streams that have both read and write accesses.
|
|
This patch moves the draining interface from SimObject to a separate
class that can be used by any object needing draining. However,
objects not visible to the Python code (i.e., objects not deriving
from SimObject) still depend on their parents informing them when to
drain. This patch also gets rid of the CountedDrainEvent (which isn't
really an event) and replaces it with a DrainManager.
|
|
When casting objects in the generated SWIG interfaces, SWIG uses
classical C-style casts ( (Foo *)bar; ). In some cases, this can
degenerate into the equivalent of a reinterpret_cast (mainly if only a
forward declaration of the type is available). This usually works for
most compilers, but it is known to break if multiple inheritance is
used anywhere in the object hierarchy.
This patch introduces the cxx_header attribute to Python SimObject
definitions, which should be used to specify a header to include in
the SWIG interface. The header should include the declaration of the
wrapped object. We currently don't enforce header the use of the
header attribute, but a warning will be generated for objects that do
not use it.
|
|
The Memtest tester allows for only one request to be outstanding for a
particular physical address. The check has been written separately for
reads and writes. This patch moves the check earlier than its current
position so that it need not be written separately for reads and writes.
|
|
This patch adds an additional level of ports in the inheritance
hierarchy, separating out the protocol-specific and protocl-agnostic
parts. All the functionality related to the binding of ports is now
confined to use BaseMaster/BaseSlavePorts, and all the
protocol-specific parts stay in the Master/SlavePort. In the future it
will be possible to add other protocol-specific implementations.
The functions used in the binding of ports, i.e. getMaster/SlavePort
now use the base classes, and the index parameter is updated to use
the PortID typedef with the symbolic InvalidPortID as the default.
|
|
This patch adds a traffic generator to the code base. The generator is
aimed to be used as a black box model to create appropriate use-cases
and benchmarks for the memory system, and in particular the
interconnect and the memory controller.
The traffic generator is a master module, where the actual behaviour
is captured in a state-transition graph where each state generates
some sort of traffic. By constructing a graph it is possible to create
very elaborate scenarios from basic generators. Currencly the set of
generators include idling, linear address sweeps, random address
sequences and playback of traces (recording will be done by the
Communication Monitor in a follow-up patch). At the moment the graph
and the states are described in an ad-hoc line-based format, and in
the future this should be aligned with our used of e.g. the Google
protobufs. Similarly for the traces, the format is currently a
simplistic ad-hoc line-based format that merely serves as a starting
point.
In addition to being used as a black-box model for system components,
the traffic generator is also useful for creating test cases and
regressions for the interconnect and memory system. In future patches
we will use the traffic generator to create DRAM test cases for the
controller model.
The patch following this one adds a basic regressions which also
contains an example configuration script and trace file for playback.
|
|
|
|
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
|
|
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
|