Age | Commit message (Collapse) | Author |
|
This patch is meant for allowing predicated reads and writes. Note that this
predication is different from the ISA provided predication. They way we
currently provide the ISA description for X86, we read/write registers that
do not need to be actually read/written. This is likely to be true for other
ISAs as well. This patch allows for read and write predicates to be associated
with operands. It allows for the register indices for source and destination
registers to be decided at the time when the microop is constructed. The
run time indicies come in to play only when the at least one of the
predicates has been provided. This patch will not affect any of the ISAs that
do not provide these predicates. Also the patch assumes that the order in
which operands appear in any function of the microop is same across all the
functions of the microops. A subsequent patch will enable predication for the
x86 ISA.
|
|
Minor patch against so building on NetBSD is possible.
|
|
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
|
|
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
|
|
This patch removes the NACK frrom the packet as there is no longer any
module in the system that issues them (the bridge was the only one and
the previous patch removes that).
The handling of NACKs was mostly avoided throughout the code base, by
using e.g. panic or assert false, but in a few locations the NACKs
were actually dealt with (although NACKs never occured in any of the
regressions). Most notably, the DMA port will now never receive a NACK
and the backoff time is thus never changed. As a consequence, the
entire backoff mechanism (similar to a PCI bus) is now removed and the
DMA port entirely relies on the bus performing the arbitration and
issuing a retry when appropriate. This is more in line with e.g. PCIe.
Surprisingly, this patch has no impact on any of the regressions. As
mentioned in the patch that removes the NACK from the bridge, a
follow-up patch should change the request and response buffer size for
at least one regression to also verify that the system behaves as
expected when the bridge fills up.
|
|
This patch removes the overloading of the parameter, which seems both
redundant, and possibly incorrect.
The PciConfigAll now also uses a Param.Latency rather than a
Param.Tick. For backwards compatibility it still sets the pio_latency
to 1 tick. All the comments have also been updated to not state that
it is in simticks when it is not necessarily the case.
|
|
This patch moves the clock of the CPU, bus, and numerous devices to
the new class ClockedObject, that sits in between the SimObject and
MemObject in the class hierarchy. Although there are currently a fair
amount of MemObjects that do not make use of the clock, they
potentially should do so, e.g. the caches should at some point have
the same clock as the CPU, potentially with a 1:n ratio. This patch
does not introduce any new clock objects or object hierarchies
(clusters, clock domains etc), but is still a step in the direction of
having a more structured approach clock domains.
The most contentious part of this patch is the serialisation of clocks
that some of the modules (but not all) did previously. This
serialisation should not be needed as the clock is set through the
parameters even when restoring from the checkpoint. In other words,
the state is "stored" in the Python code that creates the modules.
The nextCycle methods are also simplified and the clock phase
parameter of the CPU is removed (this could be part of a clock object
once they are introduced).
|
|
Alpha System was overriding loadState() function to setup some functional
event. The system tried to read/write to memory before the Ruby memory had
unserialized the state. With this patch, Alpha System overrides the
startup() function, and sets up functional events in this function. This
works because startup() is called after Ruby memory system has unserialized
the memory state.
|
|
DPRINTFs
This patch fixes some problems with the drain/switchout functionality
for the O3 cpu and for the ARM ISA and adds some useful debug print
statements.
This is an incremental fix as there are still a few bugs/mem leaks with the
switchout code. Particularly when switching from an O3CPU to a
TimingSimpleCPU. However, when switching from O3 to O3 cores with the ARM ISA
I haven't encountered any more assertion failures; now the kernel will
typically panic inside of simulation.
|
|
New tool chains seem to be looking for kernel versions newer than what
this this was previously set to. Also take this opportunity to change
the hostname we report in uname to sim.gem5.org.
|
|
Added/moved rlimit constants to base linux header file.
This patch is a revised version of Vince Weaver's earlier patch.
|
|
Enable different whitelists for different OS/arch combinations,
since some use the generic Linux definitions only, and others
use definitions inherited from earlier Unix flavors on those
architectures.
Also update x86 function pointers so ioctl is no longer
unimplemented on that platform.
This patch is a revised version of Vince Weaver's earlier patch.
|
|
According to the A15 TRM the value of this register is as follows (assuming 16 word = 64 byte lines)
[31:29] Format - b100 specifies v7
[28] RAZ - b0
[27:24] CWG log2(max writeback size #words) - 0x4 16 words
[23:20] ERG log2(max reservation size #words) - 0x4 16 words
[19:16] DminLine log2(smallest dcache line #words) - 0x4 16 words
[15:14] L1Ip L1 index/tagging policy - b11 specifies PIPT
[13:4] RAZ - b0000000000
[3:0] IminLine log2(smallest icache line #words) - 0x4 16 words
|
|
|
|
|
|
|
|
|
|
This patch makes getAddrRanges const throughout the code base. There
is no reason why it should not be, and making it const prevents adding
any unintentional side-effects.
|
|
This patch fixes two warnings, one related to a narrowing conversion
(int to MachInst), and one due to the cast operator for arguments and
a mismatch in const-ness (const void* and void*).
|
|
The check should be with the op2 field, not with the op1 field.
|
|
Static binaries generated with new versions of libc complain that the kernel
is too old otherwise.
|
|
npc in PCState for ARM was being calculated before the current flags were
updated with the next flags. This causes an issue as the npc is incremented by
two or four depending on the current flags (thumb or not) and was leading to
branches that were predicted correctly being identified as mispredicted.
|
|
|
|
This patch fixes a failing compilation caused by MaxMiscDestRegs being
zero. According to gcc 4.6, the result is a comparison that is always
false due to limited range of data type.
|
|
|
|
Due to recent changes to X86 TLB, gem5 stopped compiling on
gcc version 4.4.3. This patch provides the fix for that problem. The patch
is tested on gcc 4.4.3. The change is not required for more recent
versions of gcc (like on 4.6.3).
|
|
initCPU() will be called to initialize switched out CPUs for the simple and
inorder CPU models. this patch prevents those CPUs from being initialized
because they should get their state from the active CPU when it is switched
out.
|
|
|
|
Extra white space fixes in miscregs.hh
|
|
This change allows designating a system as MP capable or not as some
bootloaders/kernels care that it's set right. You can have a single
processor MP capable system, but you can't have a multi-processor
UP only system. This change also fixes the initialization of the MIDR
register.
|
|
DynInst is extremely large the hope is that this re-organization will put the
most used members close to each other.
|
|
While FastAlloc provides a small performance increase (~1.5%) over regular malloc it isn't thread safe.
After removing FastAlloc and using tcmalloc I've seen a performance increase of 12% over libc malloc
when running twolf for ARM.
|
|
|
|
This eliminates a use of the ExtMachInst type outside of the ISAs.
|
|
The CPUID instruction was implemented so that it would only write its results
if the instruction was successful. This works fine on the simple CPU where
unwritten registers retain their old values, but on a CPU like O3 with
renaming this is broken. The instruction needs to write the old values back
into the registers explicitly if they aren't being changed.
|
|
There are some bits of some fields of the ExtMachInst which are not actually
used for anything but are included in the hash of an ExtMachInst for
simplicity and efficiency. This change makes sure the decoder's internal
working ExtMachInst is completely initialized, even these unused bits, so that
there isn't any nondeterministic behavior, no valgrind messages about
uninitialized variables, and no potential false misses/redundant entries in
the decode cache.
|
|
|
|
The GDT can be accessed by user level software running in compatibility mode
by moving segment selectors into segment registers. The GDT needs to be set up
at an address accessible in this mode.
|
|
A small change was added a while ago to keep addresses from overflowing 32
bits when larger addresses shouldn't be accessible to software. That change
truncated when not in long mode, but really it should have truncated when not
in 64 bit mode. The difference is whether compatibility mode is included, a
mode that's supposed to act like a legacy 32 bit mode.
|
|
This will allow it to be specialized by the ISAs. The existing caching scheme
is provided by the BasicDecodeCache in the GenericISA namespace and is built
from the generalized components.
--HG--
rename : src/cpu/decode_cache.cc => src/arch/generic/decode_cache.cc
|
|
These classes are always used together, and merging them will give the ISAs
more flexibility in how they cache things and manage the process.
--HG--
rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
|
|
|
|
--HG--
rename : src/cpu/decode.cc => src/arch/generic/decoder.cc
rename : src/cpu/decode.hh => src/arch/generic/decoder.hh
|
|
This patch moves the DMA device to its own set of files, splitting it
from the IO device. There are no behavioural changes associated with
this patch.
The patch also grabs the opportunity to do some very minor tidying up,
including some white space removal and pruning some redundant
parameters.
Besides the immediate benefits of the separation-of-concerns, this
patch also makes upcoming changes more streamlined as it split the
devices that are only slaves and the DMA device that also acts as a
master.
--HG--
rename : src/dev/io_device.cc => src/dev/dma_device.cc
rename : src/dev/io_device.hh => src/dev/dma_device.hh
|
|
This patch makes the (device) DmaPort non-snooping and removes the
recvSnoop constructor parameter and instead introduces a
SnoopingDmaPort subclass for the ARM table walker.
Functionality is unchanged, as are the stats, and the patch merely
clarifies that the normal DMA ports are not snooping (although they
may issue requests that are snooped by others, as done with PCI, PCIe,
AMBA4 ACE etc).
Currently this port is declared in the ARM table walker as it is not
used anywhere else. If other ports were to have similar behaviour it
could be moved in a future patch.
|
|
This patch moves the ECF and EZF bits to individual registers (ecfBit and
ezfBit) and the CF and OF bits to cfofFlag registers. This is being done
so as to lower the read after write dependencies on the the condition code
register. Ultimately we will have the following registers [ZAPS], [OF],
[CF], [ECF], [EZF] and [DF]. Note that this is only one part of the
solution for lowering the dependencies. The other part will check whether
or not the condition code register needs to be actually read. This would
be done through a separate patch.
|
|
Shuffle the 32 bit values into position, and then add in parallel.
|
|
Symbol tables masked with the loadAddrMask create redundant entries
that could conflict with kernel function events that rely on the
original addresses. This patch guards the creation of those masked
symbol tables by default, with an option to enable them when needed
(for early-stage kernel debugging, etc.)
|
|
|
|
This patch moves send/recvTiming and send/recvTimingSnoop from the
Port base class to the MasterPort and SlavePort, and also splits them
into separate member functions for requests and responses:
send/recvTimingReq, send/recvTimingResp, and send/recvTimingSnoopReq,
send/recvTimingSnoopResp. A master port sends requests and receives
responses, and also receives snoop requests and sends snoop
responses. A slave port has the reciprocal behaviour as it receives
requests and sends responses, and sends snoop requests and receives
snoop responses.
For all MemObjects that have only master ports or slave ports (but not
both), e.g. a CPU, or a PIO device, this patch merely adds more
clarity to what kind of access is taking place. For example, a CPU
port used to call sendTiming, and will now call
sendTimingReq. Similarly, a response previously came back through
recvTiming, which is now recvTimingResp. For the modules that have
both master and slave ports, e.g. the bus, the behaviour was
previously relying on branches based on pkt->isRequest(), and this is
now replaced with a direct call to the apprioriate member function
depending on the type of access. Please note that send/recvRetry is
still shared by all the timing accessors and remains in the Port base
class for now (to maintain the current bus functionality and avoid
changing the statistics of all regressions).
The packet queue is split into a MasterPort and SlavePort version to
facilitate the use of the new timing accessors. All uses of the
PacketQueue are updated accordingly.
With this patch, the type of packet (request or response) is now well
defined for each type of access, and asserts on pkt->isRequest() and
pkt->isResponse() are now moved to the appropriate send member
functions. It is also worth noting that sendTimingSnoopReq no longer
returns a boolean, as the semantics do not alow snoop requests to be
rejected or stalled. All these assumptions are now excplicitly part of
the port interface itself.
|