Age | Commit message (Collapse) | Author |
|
Determine if a request has an associated virtual address.
|
|
Static analysis unearther a bunch of uninitialised variables and
members, and this patch addresses the problem. In all cases these
omissions seem benign in the end, but at least fixing them means less
false positives next time round.
|
|
This patch extends the classic prefetcher to work on non-block aligned
addresses. Because the existing prefetchers in gem5 mask off the lower
address bits of cache accesses, many predictable strides fail to be
detected. For example, if a load were to stride by 48 bytes, with 64 byte
cachelines, the current stride based prefetcher would see an access pattern
of 0, 64, 64, 128, 192.... Thus not detecting a constant stride pattern. This
patch fixes this, by training the prefetcher on access and not masking off the
lower address bits.
It also adds the following configuration options:
1) Training/prefetching only on cache misses,
2) Training/prefetching only on data acceses,
3) Optionally tagging prefetches with a PC address.
#3 allows prefetchers to train off of prefetch requests in systems with
multiple cache levels and PC-based prefetchers present at multiple levels.
It also effectively allows a pipelining of prefetch requests (like in POWER4)
across multiple levels of cache hierarchy.
Improves performance on my gem5 configuration by 4.3% for SPECINT and 4.7% for SPECFP (geomean).
|
|
|
|
|
|
This patch adds the basic building blocks required to support e.g. ARM
TrustZone by discerning secure and non-secure memory accesses.
|
|
This patch enables tracking of cache occupancy per thread along with
ages (in buckets) per cache blocks. Cache occupancy stats are
recalculated on each stat dump.
|
|
Add some values and methods to the request object to track the translation
and access latency for a request and which level of the cache hierarchy responded
to the request.
|
|
This patch adds a flag in the request class that indicates if the request
was made in privileged mode.
|
|
ASI_BITS in the Request object were originally used to store a memory
request's ASI on SPARC. This is not the case any more since other ISAs
use the ASI bits to store architecture-dependent information. This
changeset renames the ASI_BITS to ARCH_BITS which better describes
their use. Additionally, the getAsi() accessor is renamed to
getArchFlags().
|
|
Using address bit 63 to identify generic IPRs caused problems on
SPARC, where IPRs are heavily used. This changeset redefines how
generic IPRs are identified. Instead of using bit 63, we now use a
separate flag (GENERIC_IPR) a memory request.
|
|
Reuse the address finalization code in the TLB instead of replicating
it when handling MMIO. This patch also adds support for injecting
memory mapped IPR requests into the memory system.
|
|
This patch enables dumping statistics and Linux process information on
context switch boundaries (__switch_to() calls) that are used for
Streamline integration (a graphical statistics viewer from ARM).
|
|
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
|
|
This patch fixes the cache stats to use the new request ids.
Cache stats also display the requestor names in the vector subnames.
Most cache stats now include "nozero" and "nonan" flags to reduce the
amount of excessive cache stat dump. Also, simplified
incMissCount()/incHitCount() functions.
|
|
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
|
|
|
|
There may not be a formally correct spelling for the past tense of mmap, but
mmapped is the spelling Google doesn't try to autocorrect. This makes sense
because it mirrors the past tense of map->mapped and not the past tense of
cape->caped.
--HG--
rename : src/arch/alpha/mmaped_ipr.hh => src/arch/alpha/mmapped_ipr.hh
rename : src/arch/arm/mmaped_ipr.hh => src/arch/arm/mmapped_ipr.hh
rename : src/arch/mips/mmaped_ipr.hh => src/arch/mips/mmapped_ipr.hh
rename : src/arch/power/mmaped_ipr.hh => src/arch/power/mmapped_ipr.hh
rename : src/arch/sparc/mmaped_ipr.hh => src/arch/sparc/mmapped_ipr.hh
rename : src/arch/x86/mmaped_ipr.hh => src/arch/x86/mmapped_ipr.hh
|
|
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.
|
|
These flags were being used to identify what alignment a request needed, but
the same information is available using the request size. This change also
eliminates the isMisaligned function. If more complicated alignment checks are
needed, they can be signaled using the ASI_BITS space in the flags vector like
is currently done with ARM.
|
|
CLREX is the name of an ARM instruction, not a name for this generic flag.
|
|
when it in received
|
|
|
|
|
|
|
|
The inconsistency was causing a subtle bug with some of the
constructors where the params had the same name as the fields.
This is also a first step to switching the accessors over to
our new "standard", e.g., getVaddr() -> vaddr().
|
|
|
|
|
|
--HG--
rename : src/sim/host.hh => src/base/types.hh
|
|
|
|
|
|
|
|
This frees up needed space for more public flags. Also:
- remove unused Request accessor methods
- make Packet use public Request accessors, so it need not be a friend
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Apparently we broke it with the cache rewrite and never noticed.
Thanks to Bao Yungang <baoyungang@gmail.com> for a significant part
of these changes (and for inspiring me to work on the rest).
Some other overdue cleanup on the prefetch code too.
|
|
|
|
I did some of the flags and assertions wrong. Thanks to Brad Beckmann
for pointing this out. I should have run the opt regressions instead
of the fast. I also screwed up some of the logical functions in the Flags
class.
|
|
|
|
|
|
Would have saved me much debugging time if these
had been in there previously.
|
|
the primary identifier for a hardware context should be contextId(). The
concept of threads within a CPU remains, in the form of threadId() because
sometimes you need to know which context within a cpu to manipulate.
|
|
|
|
should configure their editors to not insert tabs
|
|
|
|
--HG--
extra : convert_revision : d4e19afda897bc3797868b40469ce2ec7ec7d251
|