Age | Commit message (Collapse) | Author |
|
They are for the following features in the LTAGE loop predictor:
- Hashing for calculating the loop table entry
- Add direction information
- Add speculative iteration number information
Change-Id: I395f4526163ee0d0229d1e87cde2bb046f1dd43a
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14597
Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Reviewed-by: Louis Delhez <ldelhez@ucla.edu>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
They are basically used to tell wich component of the predictor is
providing the prediction and whether it is correct or wrong
Change-Id: I7b3db66535f159091f1b37d70c2d942d50b20fb2
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14535
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
The new derived LTAGE is equivalent to the original LTAGE implementation
The default values of the TAGE branch predictor match the settings of the
8C-TAGE configuration described in https://www.jilp.org/vol8/v8paper1.pdf
Change-Id: I8323adbfd5c9a45db23cfff234218280e639f9ed
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14435
Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
This includes TAGE tag sizes, TAGE table sizes, U counters reset period,
loop predictor associativity, path history size, the USE_ALT_ON_NA size
and the WITHLOOP size
Change-Id: I935823f0a5794f5d55b744263798897a813dc1bd
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14417
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
Increased to 2 bits of useful counter per TAGE entry as described in the
LTAGE paper (and made the size configurable)
Changed how the useful counters are incremented/decremented as described
in the LTAGE paper
Change-Id: I8c692cc7c180d29897cb77781681ff498a1d16c8
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14215
Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
Fixed the following fields of the loop predictor entries as described on
the LTAGE paper:
- Age counter (it was 3 bits and it should be 8 bits)
- Tag (it was 16 bits and it should be 14 bits). Also some times it used
int variables and some times uint16_t, leading to wrong behaviour
- Confidence counter (it was 2 bits ins some parts of the code and 3 bits
in some other parts. It should be 2 bits)
- Iteration counters (they were 16 bits and they should be 14 bits)
All the new sizes are now configurable
Change-Id: I8884c7454c1e510b65160eb4d5749d3259d34096
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14216
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
The LTAGE paper states that only one TAGE entry can be
allocated when updating
Change-Id: I6cfb4d80ce835e93d4bf5099ef88a7d425abaddd
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14195
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Added the parameter "--bp-type" to set the branch predictor type
Added the parameter "--list-bp-types" to list all the available branch
predictor types
Change-Id: Ia6aae90c784aef359b6d8233c8383cd7a871aca1
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14015
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Jason Lowe-Power <jason@lowepower.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
The LTAGE paper states 1 hyst bit shared for 4 pred bits.
Made this ratio configurable use 4 by default.
Also changed the Bimodal structure to use two std::vector<bool> (one for
pred and one for hyst bits)
Change-Id: I6793e8e358be01b75b8fd181ddad50f259862d79
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14120
Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com>
Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
The PC needs to be shifted according to the instShiftAmt parameter
Change-Id: I272619c093695b56cf7f8ff7163e3b5d23205d16
Signed-off-by: Pau Cabre <pau.cabre@metempsy.com>
Reviewed-on: https://gem5-review.googlesource.com/c/14035
Reviewed-by: Sudhanshu Jha <sudhanshu.jha@arm.com>
Reviewed-by: Ilias Vougioukas <ilias.vougioukas@arm.com>
Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
These files aren't a collection of miscellaneous stuff, they're the
definition of the Logger interface, and a few utility macros for
calling into that interface (panic, warn, etc.).
Change-Id: I84267ac3f45896a83c0ef027f8f19c5e9a5667d1
Reviewed-on: https://gem5-review.googlesource.com/6226
Reviewed-by: Brandon Potter <Brandon.Potter@amd.com>
Maintainer: Gabe Black <gabeblack@google.com>
|
|
When different sizes were set for the choice and global saturation
counter (e.g. ex5_big), the threshold calculation used the wrong
size. Thus the branch predictor always predicted "not taken" for
choice > global.
Change-Id: I076549ff1482e2280cef24a0d16b7bb2122d4110
Reviewed-on: https://gem5-review.googlesource.com/4560
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
Maintainer: Jason Lowe-Power <jason@lowepower.com>
|
|
|
|
This patch implements an L-TAGE predictor, based on André Seznec's code
available from CBP-2
(http://hpca23.cse.tamu.edu/taco/camino/cbp2/cbp-src/realistic-seznec.h).
Signed-off-by Jason Lowe-Power <jason@lowepower.com>
|
|
The Minor and o3 cpu models share the branch prediction
code. Minor relies on the BPredUnit::squash() function
to update the branch predictor tables on a branch mispre-
diction. This is fine because Minor executes in-order, so
the update is on the correct path. However, this causes the
branch predictor to be updated on out-of-order branch
mispredictions when using the o3 model, which should not
be the case.
This patch guards against speculative update of the branch
prediction tables. On a branch misprediction, BPredUnit::squash()
calls BpredUnit::update(..., squashed = true). The underlying
branch predictor tests against the value of squashed. If it is
true, it restores any speculatively updated internal state
it might have (e.g., global/local branch history), then returns.
If false, it updates its prediction tables. Previously, exist-
ing predictors did not test against the "squashed" parameter.
To accomodate for this change, the Minor model must now call
BPredUnit::squash() then BPredUnit::update(..., squashed = false)
on branch mispredictions. Before, calling BpredUnit::squash()
performed the prediction tables update.
The effect is a slight MPKI improvement when using the o3
model. A further patch should perform the same modifications
for the indirect target predictor and BTB (less critical).
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
The tournament predictor is presented as doing speculative
update of the global history and non-speculative update
of the local history used to generate the branch prediction.
However, the code does speculative update of both histories.
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
This function was used by the now-defunct InOrderCPU model. Since this
model is no longer in gem5, this function was not called from anywhere in
the code.
|
|
Fixing an issue with regStats not calling the parent class method
for most SimObjects in Gem5. This causes issues if one adds new
stats in the base class (since they are never initialized properly!).
Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Branch predictors that use GHRs should index them on a
per-thread basis. This makes that so.
This is a re-spin of fb51231 after the revert (bd1c6789).
|
|
This patch adds a configurable indirect branch predictor that can be indexed
by a combination of GHR and path history hashes. Implements the functionality
described in:
"Target prediction for indirect jumps" by Chang, Hao, and Patt
http://dl.acm.org/citation.cfm?id=264209
This is a re-spin of fb9d142 after the revert (bd1c6789).
|
|
The extant BTB code doesn't hash on the thread id but does check the
thread id for 'btb hits'. This results in 1-thread of a multi-threaded
workload taking a BTB entry, and all other threads missing for the same branch
missing.
|
|
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
|
|
Branch predictors that use GHRs should index them on a
per-thread basis. This makes that so.
|
|
This patch adds a configurable indirect branch predictor that can be indexed
by a combination of GHR and path history hashes. Implements the functionality
described in:
"Target prediction for indirect jumps" by Chang, Hao, and Patt
http://dl.acm.org/citation.cfm?id=264209
|
|
The extant BTB code doesn't hash on the thread id but does check the
thread id for 'btb hits'. This results in 1-thread of a multi-threaded
workload taking a BTB entry, and all other threads missing for the same branch
missing.
|
|
Result of running 'hg m5style --skip-all --fix-control -a'.
|
|
Make best use of the compiler, and enable -Wextra as well as
-Wall. There are a few issues that had to be resolved, but they are
all trivial.
|
|
This patch adds explicit overrides as this is now required when using
"-Wall" with clang >= 3.5, the latter now part of the most recent
XCode. The patch consequently removes "virtual" for those methods
where "override" is added. The latter should be enough of an
indication.
As part of this patch, a few minor issues that clang >= 3.5 complains
about are also resolved (unused methods and variables).
|
|
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap
(and similar) abstractions, as these are no longer needed with gcc 4.7
and clang 3.1 as minimum compiler versions.
|
|
When a branch gets squashed, it's speculative branch predictor state should get
rolled back in squash(). However, only the globalHistory state was being
rolled back. This patch adds (at least some) support for rolling back the
local predictor state also.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This changeset adds probe points that can be used to implement PMU
counters for branch predictor stats. The following probes are
supported:
* BPRedUnit::ppBranches / Branches
* BPRedUnit::ppMisses / Misses
|
|
This patch optimises the passing of StaticInstPtr by avoiding copying
the reference-counting pointer. This avoids first incrementing and
then decrementing the reference-counting pointer.
|
|
A small bug in the bimodal predictor caused significant degradation in
performance on some benchmarks. This was caused by using the wrong
globalHistoryReg during the update phase. This patches fixes the bug
and brings the performance to normal level.
|
|
When a branch mispredicted gem5 would squash all history after and including
the mispredicted branch. However, the mispredicted branch is still speculative
and its history is required to rollback state if another, older, branch
mispredicts. This leads to things like RAS corruption.
|
|
The branch predictor is normally only built when a CPU that uses a
branch predictor is built. The list of CPUs is currently incomplete as
the simple CPUs support branch predictors (for warming, branch stats,
etc). In practice, all CPU models now use branch predictors, so this
changeset removes the CPU model check and replaces it with a check for
the NULL ISA.
|
|
This patch does some minor house keeping of the branch predictor by
adopting STL containers, and shifting some iterator to use range-based
for loops.
The predictor history is also changed from a list to a deque as we
never to insertion/deletion other than at the front and back.
|
|
This patch contains a new CPU model named `Minor'. Minor models a four
stage in-order execution pipeline (fetch lines, decompose into
macroops, decompose macroops into microops, execute).
The model was developed to support the ARM ISA but should be fixable
to support all the remaining gem5 ISAs. It currently also works for
Alpha, and regressions are included for ARM and Alpha (including Linux
boot).
Documentation for the model can be found in src/doc/inside-minor.doxygen and
its internal operations can be visualised using the Minorview tool
utils/minorview.py.
Minor was designed to be fairly simple and not to engage in a lot of
instruction annotation. As such, it currently has very few gathered
stats and may lack other gem5 features.
Minor is faster than the o3 model. Sample results:
Benchmark | Stat host_seconds (s)
---------------+--------v--------v--------
(on ARM, opt) | simple | o3 | minor
| timing | timing | timing
---------------+--------+--------+--------
10.linux-boot | 169 | 1883 | 1075
10.mcf | 117 | 967 | 491
20.parser | 668 | 6315 | 3146
30.eon | 542 | 3413 | 2414
40.perlbmk | 2339 | 20905 | 11532
50.vortex | 122 | 1094 | 588
60.bzip2 | 2045 | 18061 | 9662
70.twolf | 207 | 2736 | 1036
|
|
|
|
|
|
having separate params for the local/globalHistoryBits and the
local/globalPredictorSize can lead to inconsistencies if they
are not carefully set. this patch dervies the number of bits
necessary to index into the local/global predictors based on
their size.
the value of the localHistoryTableSize for the ARM O3 CPU has been
increased to 1024 from 64, which is more accurate for an A15 based
on some correlation against A15 hardware.
|
|
This patch moves the branch predictor files in the o3 and inorder directories
to src/cpu/pred. This allows sharing the branch predictor across different
cpu models.
This patch was originally posted by Timothy Jones in July 2010
but never made it to the repository.
--HG--
rename : src/cpu/o3/bpred_unit.cc => src/cpu/pred/bpred_unit.cc
rename : src/cpu/o3/bpred_unit.hh => src/cpu/pred/bpred_unit.hh
rename : src/cpu/o3/bpred_unit_impl.hh => src/cpu/pred/bpred_unit_impl.hh
rename : src/cpu/o3/sat_counter.hh => src/cpu/pred/sat_counter.hh
|
|
globalHistoryBits, globalPredictorSize, and choicePredictorSize are decoupled.
globalHistoryBits controls how much history is kept, global and choice
predictor sizes control how much of that history is used when accessing
predictor tables. This way, global and choice predictors can actually be
different sizes, and it is no longer possible to walk off the predictor arrays
and cause a seg fault.
There are now individual thresholds for choice, global, and local saturating
counters, so that taken/not taken decisions are correct even when the
predictors' counters' sizes are different.
The interface for localPredictorSize has been removed from TournamentBP because
the value can be calculated from localHistoryBits.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Fix some issues with the local predictor and the way it's indexed.
|
|
Change RAS to fix issues with predicated call/return instructions.
Handled all cases in the life of a predicated call and return instruction.
|
|
1. Updates the Branch Predictor correctly to the state
just after a mispredicted branch, if a squash occurs.
2. If a BTB does not find an entry, the branch is predicted not taken.
The global history is modified to correctly reflect this prediction.
3. Local history is now updated at the fetch stage instead of
execute stage.
4. In the Update stage of the branch predictor the local predictors are
now correctly updated according to the state of local history during
fetch stage.
This patch also improves performance by as much as 17% on some benchmarks
|
|
|
|
Branch predictor could not predict a branch in a nested loop because:
1. The global history was not updated after a mispredict squash.
2. The global history was updated in the fetch stage. The choice predictors
that were updated used the changed global history. This is incorrect, as
it incorporates the state of global history after the branch in
encountered. Fixed update to choice predictor using the global history
state before the branch happened.
3. The global predictor table was also updated using the global history state
before the branch happened as above.
Additionally, parameters to initialize ctr and history size were reversed.
|
|
|
|
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
|