Age | Commit message (Collapse) | Author |
|
|
|
This patch updates the stats for the regressions that were affected by
the typo in the simple-atomic-mp configuration.
|
|
This patch fixes a minor typo that managed to sneak into the
simple-atomic-mp regression configuration.
|
|
This patch updates the stats to reflect the changes in the L2 MSHRs,
as the latter are now uniform across the regressions.
|
|
This patch unified the L1 and L2 caches used throughout the
regressions instead of declaring different, but very similar,
configurations in the different scripts.
The patch also changes the default L2 configuration to match what it
used to be for the fs and se scripts (until the last patch that
updated the regressions to also make use of the cache config). The
MSHRs and targets per MSHR are now set to a more realistic default of
20 and 12, respectively.
As a result of both the aforementioned changes, many of the regression
stats are changed. A follow-on patch will bump the stats.
|
|
|
|
This patch unifies the naming of the default L1 and L2 caches in the
regression configs to be in line with what is used in the se and fs
scripts.
|
|
This patch updates the stats to reflect the change in the default
system clock from 1 THz to 1GHz. The changes are due to the DMA
devices now injecting requests at a lower pace.
|
|
This patch bumps the stats to match the use of SimpleDRAM instead of
SimpleMemory in all inorder and O3 regressions, and also all
full-system regressions. A number of performance-related stats change,
and a whole bunch of stats are added for the memory controller.
|
|
This patch favours using SimpleDRAM with the default timing instead of
SimpleMemory for all regressions that involve the o3 or inorder CPU,
or are full system (in other words, where the actual performance of
the memory is important for the overall performance).
Moving forward, the solution for FSConfig and the users of fs.py and
se.py is probably something similar to what we use to choose the CPU
type. I envision a few pre-set configurations SimpleLPDDR2,
SimpleDDR3, etc that can be choosen by a dram_type option. Feedback on
this part is welcome.
This patch changes plenty stats and adds all the DRAM controller
related stats. A follow-on patch updates the relevant statistics. The
total run-time for the entire regression goes up with ~5% with this
patch due to the added complexity of the SimpleDRAM model. This is a
concious trade-off to ensure that the model is properly tested.
|
|
This patch uses the common L1, L2 and IOCache configuration for the
regressions that all share the same cache parameters. There are a few
regressions that use a slightly different configuration (memtest,
o3-timing=mp, simple-atomic-mp and simple-timing-mp), and the latter
are not changed in this patch. They will be updated in a future patch.
The common cache configurations are changed to match the ones used in
the regressions, and are slightly changed with respect to what they
were. Hopefully this means we can converge on a common base
configuration, used both in the normal user configurations and
regressions.
As only regressions that shared the same cache configuration are
updated, no regressions are affected.
|
|
This patch updates the stats after removing the zero-time send used in
the DMA port.
|
|
This patch brings the t1000 stats up to date.
|
|
|
|
|
|
This patch updates the stats to reflect the change in how cache
latencies are expressed. In addition, the latencies are now rounded to
multiples of the clock period, thus also affecting other stats.
|
|
This patch changes the cache-related latencies from an absolute time
expressed in Ticks, to a number of cycles that can be scaled with the
clock period of the caches. Ultimately this patch serves to enable
future work that involves dynamic frequency scaling. As an immediate
benefit it also makes it more convenient to specify cache performance
without implicitly assuming a specific CPU core operating frequency.
The stat blocked_cycles that actually counter in ticks is now updated
to count in cycles.
As the timing is now rounded to the clock edges of the cache, there
are some regressions that change. Plenty of them have very minor
changes, whereas some regressions with a short run-time are perturbed
quite significantly. A follow-on patch updates all the statistics for
the regressions.
|
|
This patch updates the memtest stats to reflect the addition of a
clock other than the default one.
|
|
This patch changes the memtest clock from 1THz (the default) to 2GHz,
similar to the CPUs in the other regressions. This is useful as the
caches will adopt the same clock as the CPU. The bus clock rate is
scaled accordingly, and the L1-L2 bus is kept at the CPU clock while
the memory bus is at half that frequency.
A separate patch updates the affected stats.
|
|
This patch updates the stats to reflect the changes in the clock speed
and width for the bus connecting the L1 and L2 caches.
|
|
This patch updates the name of the l2 stats.
|
|
This patch unifies the full-system regression config scripts and uses
the BaseCPU convenience method addTwoLevelCacheHierarchy to connect up
the L1s and L2, and create the bus inbetween.
The patch is a step on the way to use the clock period to express the
cache latencies, as the CPU is now the parent of the L1, L2 and L1-L2
bus, and these modules thus use the CPU clock.
The patch does not change the value of any stats, but plenty names,
and a follow-up patch contains the update to the stats, chaning
system.l2c to system.cpu.l2cache.
|
|
|
|
|
|
In the current caches the hit latency is paid twice on a miss. This patch lets
a configurable response latency be set of the cache for the backward path.
|
|
This patch updates the stats to reflect the addition of a clock
period other than the default 1 Tick.
|
|
This patch merely adds a clock other than the default 1 Tick for the
CPUs of both the test system and drive system for the twosys-tsunami
regression.
The CPU frequency of the driver system is choosed to be twice that of
the test system to ensure it is not the bottleneck (although in this
case it mostly serves as a demonstration of a two-system setup),
|
|
--HG--
rename : tests/configs/tgen-simple-mem.py => tests/configs/tgen-simple-dram.py
rename : tests/quick/se/70.tgen/tgen-simple-mem.cfg => tests/quick/se/70.tgen/tgen-simple-dram.cfg
rename : tests/quick/se/70.tgen/tgen-simple-mem.trc => tests/quick/se/70.tgen/tgen-simple-dram.trc
|
|
This patch adds a basic regression for the traffic generator. The
regression also serves as an example of the file formats used. More
complex regressions that make use of a DRAM controller model will
follow shortly.
|
|
This patch simply bumps the stats to reflect the introduction of a
bandwidth limit of 12.8GB/s for SimpleMemory.
|
|
This patch simply removes the commitCommittedInsts and
commitCommittedOps from the reference statistics, following their
removal from the CPU.
|
|
|
|
|
|
This patch simply bumps the stats to avoid having failing
regressions. Someone with more insight in the changes should verify
that these differences all make sense.
|
|
This patch merely updates the regression stats to reflect the change
in PIO and PCI latency.
|
|
|
|
This patch bumps the stats for the realview-o3-checker after fixing
the checker CPU in the previous patch.
|
|
|
|
This patch removes the NACKing in the bridge, as the split
request/response busses now ensure that protocol deadlocks do not
occur, i.e. the message-dependency chain is broken by always allowing
responses to make progress without being stalled by requests. The
NACKs had limited support in the system with most components ignoring
their use (with a suitable call to panic), and as the NACKs are no
longer needed to avoid protocol deadlocks, the cleanest way is to
simply remove them.
The bridge is the starting point as this is the only place where the
NACKs are created. A follow-up patch will remove the code that deals
with NACKs in the endpoints, e.g. the X86 table walker and DMA
port. Ultimately the type of packet can be complete removed (until
someone sees a need for modelling more complex protocols, which can
now be done in parts of the system since the port and interface is
split).
As a consequence of the NACK removal, the bridge now has to send a
retry to a master if the request or response queue was full on the
first attempt. This change also makes the bridge ports very similar to
QueuedPorts, and a later patch will change the bridge to use these. A
first step in this direction is taken by aligning the name of the
member functions, as done by this patch.
A bit of tidying up has also been done as part of the simplifications.
Surprisingly, this patch has no impact on any of the
regressions. Hence, there was never any NACKs issued. In a follow-up
patch I would suggest changing the size of the bridge buffers set in
FSConfig.py to also test the situation where the bridge fills up.
|
|
|
|
|
|
|
|
|
|
Actual stats updates covering period since original ref outputs
were clobbered.
|
|
Apparently Nate did a wholesale update of stats files using
a binary compiled without eio, resulting in broken refernce
outputs.
|
|
|
|
This patch updates the path to the Ruby topologies and thus fixes a
failing regression.
|
|
This patch changes the simple memory to have a single slave port
rather than a vector port. The simple memory makes no attempts at
modelling the contention between multiple ports, and any such
multiplexing and demultiplexing could be done in a bus (or crossbar)
outside the memory controller. This scenario also matches with the
ongoing work on a SimpleDRAM model, which will be a single-ported
single-channel controller that can be used in conjunction with a bus
(or crossbar) to create a multi-port multi-channel controller.
There are only very few regressions that make use of the vector port,
and these are all for functional accesses only. To facilitate these
cases, memtest and memtest-ruby have been updated to also have a
"functional" bus to perform the (de)multiplexing of the functional
memory accesses.
|
|
|
|
|