Age | Commit message (Collapse) | Author |
|
This patch takes the final step in removing the InOrderCPU from the
tree. Rest in peace.
The MinorCPU is now used to model an in-order microarchitecture, and
long term the MinorCPU will eventually be renamed InOrderCPU.
|
|
This patch changes how faults are passed between methods in an attempt
to copy as few reference-counting pointer instances as possible. This
should avoid unecessary copies being created, contributing to the
increment/decrement of the reference counters.
|
|
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
|
|
This patch cleans up a number of minor issues aiming to get closer to
compliance with the C++0x standard as interpreted by gcc and clang
(compile with std=c++0x and -pedantic-errors). In particular, the
patch cleans up enums where the last item was succeded by a comma,
namespaces closed by a curcly brace followed by a semi-colon, and the
use of the GNU-extension typeof (replaced by templated functions). It
does not address variable-length arrays, zero-size arrays, anonymous
structs, range expressions in switch statements, and the use of long
long. The generated CPU code also has a large number of issues that
remain to be fixed, mainly related to overflows in implicit constant
conversion (due to shifts).
|
|
This patch continues the unification of how the different CPU models
create and share their instruction and data ports. Most importantly,
it forces every CPU to have an instruction and a data port, and gives
these ports explicit getters in the BaseCPU (getDataPort and
getInstPort). The patch helps in simplifying the code, make
assumptions more explicit, andfurther ease future patches related to
the CPU ports.
The biggest changes are in the in-order model (that was not modified
in the previous unification patch), which now moves the ports from the
CacheUnit to the CPU. It also distinguishes the instruction fetch and
load-store unit from the rest of the resources, and avoids the use of
indices and casting in favour of keeping track of these two units
explicitly (since they are always there anyways). The atomic, timing
and O3 model simply return references to their already existing ports.
|
|
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
|
|
|
|
segfault was caused by squashed multiply thats in the process of an event.
use isProcessing flag to handle this and cleanup the MDU code
|
|
|
|
implement a clean interface to handle branch misprediction and eventually all pipeline
flushing
|
|
|
|
|
|
formerly, to free up bandwidth in a resource, we could just change the pointer in that resource
but at the same time the pipeline stages had visibility to see what happened to a resource request.
Now that we are recycling these requests (to avoid too much dynamic allocation), we can't throw
away the request too early or the pipeline stage gets bad information. Instead, mark when a request
is done with the resource all together and then let the pipeline stage call back to the resource
that it's time to free up the bandwidth for more instructions
*** inteface notes ***
- When an instruction completes and is done in a resource for that cycle, call done()
- When an instruction fails and is done with a resource for that cycle, call done(false)
- When an instruction completes, but isnt finished with a resource, call completed()
- When an instruction fails, but isnt finished with a resource, call completed(false)
* * *
inorder: tlbmiss wakeup bug fix
|
|
take away all instances of reqMap in the code and make all references use the built-in
request vectors inside of each resource. The request map was dynamically allocating
a request per instruction. The request vector just allocates N number of requests
during instantiation and then the surrounding code is fixed up to reuse those N requests
***
setRequest() and clearRequest() are the new accessors needed to define a new
request in a resource
|
|
this will allow us to reuse resource requests within a resource instead
of always dynamically allocating
|
|
first change in an optimization that will stop InOrder from allocating new memory for every instruction's
request to a resource. This gets expensive since every instruction needs to access ~10 requests before
graduation. Instead, the plan is to allocate just enough resource request objects to satisfy each resource's
bandwidth (e.g. the execution unit would need to allocate 3 resource request objects for a 1-issue pipeline
since on any given cycle it could have 2 read requests and 1 write request) and then let the instructions
contend and reuse those allocated requests. The end result is a smaller memory footprint for the InOrder model
and increased simulation performance
|
|
Maintain all information about an instruction's fault in the DynInst object rather
than any cpu-request object. Also, if there is a fault during the execution stage
then just save the fault inside the instruction and trap once the instruction
tries to graduate
|
|
Events need to be scheduled on the queue assigned
to the SimObject, not on the global queue (which
should be going away).
Also cleaned up a number of redundant expressions
that made the code unnecessarily verbose.
|
|
|
|
This change modifies the way prefetches work. They are now like normal loads
that don't writeback a register. Previously prefetches were supposed to call
prefetch() on the exection context, so they executed with execute() methods
instead of initiateAcc() completeAcc(). The prefetch() methods for all the CPUs
are blank, meaning that they get executed, but don't actually do anything.
On Alpha dead cache copy code was removed and prefetches are now normal ops.
They count as executed operations, but still don't do anything and IsMemRef is
not longer set on them.
On ARM IsDataPrefetch or IsInstructionPreftech is now set on all prefetch
instructions. The timing simple CPU doesn't try to do anything special for
prefetches now and they execute with the normal memory code path.
|
|
m5 doesnt do stats specific to binary and this resource request stat is probably only
useful for people who really know the ins/outs of the model anyway
|
|
remove the annotation 'virtual' from function declaration that isnt being derived from
|
|
also, remove inst-req stats as default.good for debugging
but in terms of pure processor stats they aren't useful
|
|
|
|
this will be used for when a thread comes back from a cache miss, it needs to update the PCs
because the inst might of been a branch or delayslot in which the next PC isnt always
a straight addition
|
|
give resources their own specific
activity to do for a "suspend" event
instead of defaulting to deactivating the thread for a
suspend thread event. This really matters
for the fetch sequence unit which wants to remove the
thread from fetching while other units want to
ignore a thread suspension. If you deactivate a thread
in a resource then you may lose some of the allotted
bandwidth that the thread is taking up...
|
|
add code to recognize memory stalls in resources and the pipeline as well
as squash a thread if there is a stall and we are in the switch on cache miss
model
|
|
- cpuEventNum
- resReqCount
|
|
|
|
|
|
make sure unrecognized events in the resource pool are deleted and also delete resource events in destructor
|
|
TLBUnit no longer used and we also get rid of memAccSize and memAccFlags functions added to ISA and StaticInst
since TLB is not a separate resource to acquire. Instead, TLB access is done before any read/write to memory
and the result is checked before it's sent out to memory.
* * *
|
|
|
|
|
|
|
|
|
|
This model currently only works in MIPS_SE mode, so it will take some effort
to clean it up and make it generally useful. Hopefully people are willing to
help make that happen!
|