Age | Commit message (Collapse) | Author |
|
Trying to read MISCREG_CTR_EL0 on AArch64 returned 0 as is was not
implmemented. With that an operating system relying on the cache line
sizes reported in order to manage the caches would (a) panic given the
returned value 0 is not valid (high bit is RES1) or (b) worst case would
assume a cache line size of 4 doing a tremendous amount of extra
instruction work (including fetching). Return the same values as for ARMv7
as the fields seem to be the same, or RES0/1 seem to be reported
accordingly for AArch64
In collaboration with: Andrew Turner
Testing Done: Checked on FreeBSD boots with extra printfs; also observed a
reduction of a factor of about 10 in instruction fetches for a simple
micro-test.
Reviewed at http://reviews.gem5.org/r/3667/
Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
|
|
|
|
Change-Id: Id4cd839c12b70616017a5830e3f9bbb59b0f97ba
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Compute the proper values of the aforementioned registers from
the system configuration rather than configuring the values themselves.
Change-Id: If9774b6610a29568b80ae4866107b9a6a5b5be0f
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Compute the proper values of the aforementioned registers from
the system configuration rather than configuring the values themselves.
Change-Id: Ie7685b5d8b5f2dd9d6380b4af74f16d596b2bfd1
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: I4e9e8f264a4a4239dd135a6c7a1c8da213b6d345
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: I814f1431a5f754f75721c9ac51171f860a714d24
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Removed from ARMARM.
Change-Id: Ie8f28e4fa6e1b46dfd9c8c4b379e5b42fe25421d
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: Idaaaeb3f7b1a0bdbf18d8e2d46686c78bb411317
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132
Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
|
|
This patch adds support for stage 2 TLBI instructions
such as TLBI IPAS2E1_Xt.
Change-Id: I0cd5e8055b0c1003e03439aa5183252f50ea0a88
|
|
During address translation instructions (such as AT S1E1R_Xt) the exception
level can be different than the current exception level. This patch fixes
how the TLB determines what EL to use during these instructions.
Change-Id: Ia9ce229404de9e284bc1f7479fd2c580efd55f8f
|
|
Change-Id: I8f7c09c7ec3a97149ebebf4b21471b244e6cecc1
|
|
Change-Id: I59fa4fae98c33d9e5c2185382e1411911d27d341
|
|
Change-Id: I5212c91c56435fe008950ed99feacc6921609226
|
|
Don't consult the TLB test interface for PA's returned by functional
translations by the AT instruction. We implement this by chaning the
ISA code to synthesize 0-length functional reads for the TLB lookup.
The TLB then bypasses the final PA check in the tester if the size is
zero.
Change-Id: I2487b7f829cea88c37e229e9fc7a4543aced961b
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
This is a re-spin of 20264eb after the revert (bd1c6789) and includes
some fixes of that commit.
|
|
The following patches had unexpected interactions with the current
upstream code and have been reverted for now:
e07fd01651f3: power: Add support for power models
831c7f2f9e39: power: Low-power idle power state for idle CPUs
4f749e00b667: power: Add power states to ClockedObject
Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
--HG--
extra : amend_source : 0b6fb073c6bbc24be533ec431eb51fbf1b269508
|
|
In general, the ThreadID parameter is unnecessary in the memory system
as the ContextID is what is used for the purposes of locks/wakeups.
Since we allocate sequential ContextIDs for each thread on MT-enabled
CPUs, ThreadID is unnecessary as the CPUs can identify the requesting
thread through sideband info (SenderState / LSQ entries) or ContextID
offset from the base ContextID for a cpu.
|
|
The decoder is responsible for splitting instructions in micro
operations (uops). Given that different micro architectures may split
operations differently, this patch allows to specify which micro
architecture each isa implements, so different cores in the system can
split instructions differently, also decoupling uop splitting
(microArch) from ISA (Arch). This is done making the decodification
calls templates that receive a type 'DecoderFlavour' that maps the
name of the operation to the class that implements it. This way there
is only one selection point (converting the command line enum to the
appropriate DecodeFeatures object). In addition, there is no explicit
code replication: template instantiation hides that, and the compiler
should be able to resolve a number of things at compile-time.
|
|
Adds per-thread interrupt controllers and thread/context logic
so that interrupts properly get routed in SMT systems.
|
|
There seems to have been a debug print left in when the original ARMv8
support was merged in. This printout is performed every time you
initialize a hardware thread, and it prints raw pointers, so it always
causes diffs in the regression. This patch removes the debug print.
|
|
This changeset cleans up the generic timer a bit and moves most of the
register juggling from the ISA code into a separate class in the same
source file as the rest of the generic timer. It also removes the
assumption that there is always 8 or fewer CPUs in the system. Instead
of having a fixed limit, we now instantiate per-core timers as they
are requested. This is all in preparation for other patches that add
support for virtual timers and a memory mapped interface.
|
|
With the recent patches addressing how we deal with uncacheable
accesses there is no longer need for the work arounds put in place to
enforce certain sections of memory to be uncacheable during boot.
|
|
The ISA code sometimes stores 16-bit ASIDs as 8-bit unsigned integers
and has a couple of inverted checks that mask out the high 8 bits of
an ASID if 16-bit ASIDs have been /enabled/. This changeset fixes both
of those issues.
|
|
This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
|
|
This patch adds support for filtering events in the PMU. In order to
do so, it updates the ISADevice base class to forward an ISA pointer
to ISA devices. This enables such devices to access the MiscReg file
to determine the current execution level.
|
|
Automatically extract cpu release address from DTB file.
Check SCTLR_EL1 to verify all caches are enabled.
|
|
This class implements a subset of the ARM PMU v3 specification as
described in the ARMv8 reference manual. It supports most of the
features of the PMU, however the following features are known to be
missing:
* Event filtering (e.g., from different privilege levels).
* Access controls (the PMU currently ignores the execution level).
* The chain counter (event no. 0x1E) is unimplemented.
The PMU itself does not implement any events, it merely provides an
interface for the configuration scripts to hook up probes that drive
events. Configuration scripts should call addEventProbe() to configure
custom events or high-level methods to configure architected
events. The Python implementation of addEventProbe() automatically
delays event type registration until after instantiation.
In order to support CPU switching and some combined counters (e.g.,
memory references synthesized from loads and stores), the PMU allows
multiple probes per event type. When creating a system that switches
between CPU models that share the same PMU, PMU events for all of the
CPU models can be registered with the PMU.
Kudos to Matt Horsnell for the initial gem5 implementation of the PMU.
|
|
Analogous to ee049bf (for x86). Requires a bump of the checkpoint version
and corresponding upgrader code to move the condition code register values
to the new register file.
|
|
|
|
Unimplemented miscregs for the generic timer were guarded by panics
in arm/isa.cc which can be tripped by the O3 model if it speculatively
executes a wrong path containing a mrs instruction with a bad miscreg
index. These registers were flagged as implemented and accessible.
This patch changes the miscreg info bit vector to flag them as
unimplemented and inaccessible. In this case, and UndefinedInst
fault will be generated if the register access is not trapped
by a hypervisor.
|
|
Note: AArch64 and AArch32 interworking is not supported. If you use an AArch64
kernel you are restricted to AArch64 user-mode binaries. This will be addressed
in a later patch.
Note: Virtualization is only supported in AArch32 mode. This will also be fixed
in a later patch.
Contributors:
Giacomo Gabrielli (TrustZone, LPAE, system-level AArch64, AArch64 NEON, validation)
Thomas Grocutt (AArch32 Virtualization, AArch64 FP, validation)
Mbou Eyole (AArch64 NEON, validation)
Ali Saidi (AArch64 Linux support, code integration, validation)
Edmund Grimley-Evans (AArch64 FP)
William Wang (AArch64 Linux support)
Rene De Jong (AArch64 Linux support, performance opt.)
Matt Horsnell (AArch64 MP, validation)
Matt Evans (device models, code integration, validation)
Chris Adeniyi-Jones (AArch64 syscall-emulation)
Prakash Ramrakhyani (validation)
Dam Sunwoo (validation)
Chander Sudanthi (validation)
Stephan Diestelhorst (validation)
Andreas Hansson (code integration, performance opt.)
Eric Van Hensbergen (performance opt.)
Gabe Black
|
|
This patch makes all the register index flattening methods const for
all the ISAs. As part of this, readMiscRegNoEffect for ARM is also
made const.
|
|
This patch removes the notion of a peer block size and instead sets
the cache line size on the system level.
Previously the size was set per cache, and communicated through the
interconnect. There were plenty checks to ensure that everyone had the
same size specified, and these checks are now removed. Another benefit
that is not yet harnessed is that the cache line size is now known at
construction time, rather than after the port binding. Hence, the
block size can be locally stored and does not have to be queried every
time it is used.
A follow-on patch updates the configuration scripts accordingly.
|
|
In order to see all registers independent of the current CPU mode, the
ARM architecture model uses the magic MISCREG_CPSR_MODE register to
change the register mappings without actually updating the CPU
mode. This hack is no longer needed since the thread context now
provides a flat interface to the register file. This patch replaces
the CPSR_MODE hack with the flat register interface.
|
|
This patch makes the values of ID_ISARx, MIDR, and FPSID configurable
as ISA parameter values. Additionally, setMiscReg now ignores writes
to all of the ID registers.
Note: This moves the MIDR parameter from ArmSystem to ArmISA for
consistency.
|
|
The ISA class on stores the contents of ID registers on many
architectures. In order to make reset values of such registers
configurable, we make the class inherit from SimObject, which allows
us to use the normal generated parameter headers.
This patch introduces a Python helper method, BaseCPU.createThreads(),
which creates a set of ISAs for each of the threads in an SMT
system. Although it is currently only needed when creating
multi-threaded CPUs, it should always be called before instantiating
the system as this is an obvious place to configure ID registers
identifying a thread/CPU.
|
|
This interface is no longer used, and getting rid of it simplifies the
decoders and code that sets up the decoders. The thread context had been used
to read architectural state which was used to contextualize the instruction
memory as it came in. That was changed so that the state is now sent to the
decoders to keep locally if/when it changes. That's significantly more
efficient.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Avoid reading them every instruction, and also eliminate the last use of the
thread context in the decoders.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
According to the A15 TRM the value of this register is as follows (assuming 16 word = 64 byte lines)
[31:29] Format - b100 specifies v7
[28] RAZ - b0
[27:24] CWG log2(max writeback size #words) - 0x4 16 words
[23:20] ERG log2(max reservation size #words) - 0x4 16 words
[19:16] DminLine log2(smallest dcache line #words) - 0x4 16 words
[15:14] L1Ip L1 index/tagging policy - b11 specifies PIPT
[13:4] RAZ - b0000000000
[3:0] IminLine log2(smallest icache line #words) - 0x4 16 words
|
|
This change allows designating a system as MP capable or not as some
bootloaders/kernels care that it's set right. You can have a single
processor MP capable system, but you can't have a multi-processor
UP only system. This change also fixes the initialization of the MIDR
register.
|
|
Enables the CheckerCPU to be selected at runtime with the --checker option
from the configs/example/fs.py and configs/example/se.py configuration
files. Also merges with the SE/FS changes.
|
|
New kernel code verifies that multi-processor extensions are available
before booting secondary CPUs.
|
|
Also clean up how we create boot loader memory a bit.
|
|
New kernels attempt to read CP14 what debug architecture is available.
These changes add the debug registers and return that none is currently
available.
|
|
This change adds a master id to each request object which can be
used identify every device in the system that is capable of issuing a request.
This is part of the way to removing the numCpus+1 stats in the cache and
replacing them with the master ids. This is one of a series of changes
that make way for the stats output to be changed to python.
|
|
Brings the CheckerCPU back to life to allow FS and SE checking of the
O3CPU. These changes have only been tested with the ARM ISA. Other
ISAs potentially require modification.
|
|
|
|
There are a set of locations is the linux kernel that are managed via
cache maintence instructions until all processors enable their MMUs & TLBs.
Writes to these locations are manually flushed from the cache to main
memory when the occur so that cores operating without their MMU enabled
and only issuing uncached accesses can receive the correct data. Unfortuantely,
gem5 doesn't support any kind of software directed maintence of the cache.
Until such time as that support exists this patch marks the specific cache blocks
that need to be coherent as non-cacheable until all CPUs enable their MMU and
thus allows gem5 to boot MP systems with caches enabled (a requirement for
booting an O3 cpu and thus an O3 CPU regression).
|