Age | Commit message (Collapse) | Author |
|
This patch fixes the CompoundFlag constructor, ensuring that it does
not dereference NULL. Doing so has undefined behaviuor, and both clang
and gcc's undefined-behaviour sanitiser was rather unhappy.
|
|
The Platform base class contains a pointer to an instance of the
System which is never initialized. This can lead to subtle bugs since
some architecture-specific platform implementations contain their own
system pointer which is normally used. However, if the platform is
accessed through a pointer to its base class, the dangling pointer
will be used instead.
|
|
This patch sets the CPU status to idle when the last active thread gets
suspended.
|
|
This patch adds a bit of documentation with insights around how
express snoops really work.
|
|
This patch adds a bit of clarification around the assumptions made in
the cache when packets are sent out, and dirty responses are
pending. As part of the change, the marking of an MSHR as in service
is simplified slightly, and comments are added to explain what
assumptions are made.
|
|
|
|
This patch extends the current address interleaving with basic hashing
support. Instead of directly comparing a number of address bits with a
matching value, it is now possible to use two independent set of
address bits XOR'ed together. This avoids issues where strided address
patterns are heavily biased to a subset of the interleaved ranges.
|
|
This patch changes the DRAM channel interleaving default behaviour to
be more representative. The default address mapping (RoRaBaCoCh) moves
the channel bits towards the least significant bits, and uses 128 byte
as the default channel interleaving granularity.
These defaults can be overridden if desired, but should serve as a
sensible starting point for most use-cases.
|
|
The method Event::initialized() tests if this != NULL as a part of the
expression that tests if an event is initialized. The only case when
this check could be false is if the method is called on a null
pointer, which is illegal and leads to undefined behavior (such as
eating your pets) according to the C++ standard. Because of this,
modern compilers (specifically, recent versions of clang) warn about
this which we treat as an error. This changeset removes the redundant
check to fix said warning.
|
|
Correctly clear the PCI interrupt belonging to a VirtIO device when
the ISR register is read.
|
|
If a time quantum event is the only one in the queue, async
events (Ctrl-C, I/O, etc.) will never be processed.
So process them first.
|
|
This patch changes how the timing CPU deals with processing responses,
always scheduling an event, even if it is for the current tick. This
helps to avoid situations where a new request shows up before a
response is finished in the crossbar, and also is more in line with
any realistic behaviour.
|
|
The Float param was not settable on the command line
due to a typo in the class definition in
python/m5/params.py. This corrects the typo and allows
floats to be set on the command line as intended.
|
|
While the IsFirstMicroop flag exists it was only occasionally used in the ARM
instructions that gem5 microOps and therefore couldn't be relied on to be correct.
|
|
Track memory size and flags as well as add some comments and consts.
|
|
We have no way of knowing if a CPU model is on the wrong path with
our execute-in-execute CPU models. Don't pretend that we do.
|
|
|
|
If someone wants to debug with legion again they can restore the
code from the repository, but no need to have it hang around indefinately.
|
|
When gem5 is a slave to another simulator and the Python is only used
to initialize the configuration (and not perform actual simulation), a
"debug start" (--debug-start) event will get freed during or immediately
after the initial Python frame's execution rather than remaining in the
event queue. This tricky patch fixes the GC issue causing this.
|
|
This patch takes the final step in removing the src and dest fields in
the packet. These fields were rather confusing in that they only
remember a single multiplexing component, and pushed the
responsibility to the bridge and caches to store the fields in a
senderstate, thus effectively creating a stack. With the recent
changes to the crossbar response routing the crossbar is now
responsible without relying on the packet fields. Thus, these
variables are now unused and can be removed.
|
|
This patch removes the source field from the ForwardResponseRecord,
but keeps the class as it is part of how the cache identifies
responses to hardware prefetches that are snooped upwards.
|
|
This patch removes the bridge sender state as the Crossbar now takes
care of remembering its own routing decisions.
|
|
This patch aligns how the response routing is done in the RubyPort,
using the SenderState for both memory and I/O accesses. Before this
patch, only the I/O used the SenderState, whereas the memory accesses
relied on the src field in the packet. With this patch we shift to
using SenderState in both cases, thus not relying on the src field any
longer.
|
|
This patch removes the need for a source and destination field in the
packet by shifting the onus of the tracking to the crossbar, much like
a real implementation. This change in behaviour also means we no
longer need a SenderState to remember the source/dest when ever we
have multiple crossbars in the system. Thus, the stack that was
created by the SenderState is not needed, and each crossbar locally
tracks the response routing.
The fields in the packet are still left behind as the RubyPort (which
also acts as a crossbar) does routing based on them. In the succeeding
patches the uses of the src and dest field will be removed. Combined,
these patches improve the simulation performance by roughly 2%.
|
|
This patch fixes a minor issue in the X86 page table walker where it
ended up sending new request packets to the crossbar before the
response processing was finished (recvTimingResp is directly calling
sendTimingReq). Under certain conditions this caused the crossbar to
see illegal combinations of request/response overlap, in turn causing
problems with a slightly modified crossbar implementation.
|
|
This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.
|
|
The ppCommit should notify the attached listener every time the cpu commits
a microop or non microcoded insturction. The listener can then decide
whether it will process only the last microop (eg. SimPoint probe).
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch ensures that inhibited packets that are about to be turned
into express snoops do not update the retry flag in the cache.
|
|
|
|
This patch fixes a bug where the DRAM controller tried to access the
system cacheline size before the system pointer was initialised. It
also fixes a bug where the granularity is 0 (no interleaving).
|
|
This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for
saving/restore the FP registers is in the file but it was not used.
The FXSAVE and FXRSTOR instructions are used in the kernel for saving and
loading the state of the mmx,xmm and fpu registers.
This operation is triggered in FS by issuing a Device Not Available Fault. The
cr0 register has a TS flag that is set upon each context change. Every time a
task access any FP related register (SIMD as well) if the TS flag is set to
one, the device not available fault is issued. The kernel saves the current
state of the registers, and restore the previous state of the currently running
task.
Right now Gem5 lacks of this capability. the Device Not Available Fault is
never issued, leading to several problems when different threads share the same
CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this
behavior.
In order to test this a hack in the atomic cpu code was done to detect if a
static instruction has any FP operands and the cr0 reg TS bit is set. This
check must be done in the ISA dependent code. But it seems to be tricky to
access the cr0 register while executing an instruction.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This change includes edits to Intel8254Timer to prevent counter events firing
before startup to comply with SimObject initialization call sequence.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
|
|
If two bitfields are of the same type, also implying that they have the same
first and last bit positions, the existing implementation would copy the
entire bitfield. That includes the __data member which is shared among all the
bitfields, effectively overwritting the entire bitunion.
This change also adjusts the write only signed bitfield assignment operator to
be like the unsigned version, using "using" instead of implementing it again
and calling down to the underlying implementation.
|
|
These are for the monitor/mwait instructions, SSSE3, and XSAVE.
|
|
That change enables CPUID bits for features that aren't implemented in gem5.
If a simulated system tries to use those features because it was told it
could, bad things can happen.
|
|
Minor was reporting the data cache access as ".inst" accesses.
This just switches the MasterPortID to dataMasterPortId.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
added ARM aarch64 unlinkat syscall support, modeled on other <xxx>at syscalls.
This gets all of the cpu2006 int workloads passing in SE mode on aarch64.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture.
Tested with a simple program in assembly language which executes the
instruction. Checked that different versions of the instruction are executed
by using the execution tracing option.
Committed by: Nilay Vaish <nilay@cs.wisc.edu
|
|
This change includes edits to MC146818 timer to prevent RTC events
firing before startup to comply with SimObject initialization call sequence.
Committed by: Nilay Vaish <nilay@cs.wisc.edu>
|
|
According to Linux man pages, if writev is successful, it returns the total
number of bytes written. Otherwise, it returns an error code. Instead of
returning 0, return the result from the actual call to writev in the system
call.
|
|
Prefechers has used rand() to generate random numers previously.
|
|
Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.
|
|
The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever
a future prefetch is ready. However, a prefetch being ready does not guarentee
that it can obtain an MSHR. So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond
until an MSHR is finally freed and the prefetch can happen.
This patch fixes this by not signaling the prefetch ready time if the prefetch
could not be generated. The event is rescheduled as soon as a MSHR becomes
available.
|
|
Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system. I hit that case. Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.
|
|
Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.
Finally, the stride-based prefetcher now assumes a custimizable lookup table
(sets/ways) rather than the previous fully associative structure.
|
|
Adds a new parameter that reserves some number of MSHR entries for demand
accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand
requests from the CPU to stall.
|
|
This patch adds table walker stats for:
- Walk events
- Instruction vs Data
- Page size histogram
- Wait time and service time histograms
- Pending requests histogram (per cycle) - measures dist. of L
(p(1..) = how often busy, p(0) = how often idle)
- Squashes, before starting and after completion
|
|
This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).
The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).
|