Age | Commit message (Collapse) | Author |
|
This patch addresses the comments and feedback on the preceding patch
that reworks the clocks and now more clearly shows where cycles
(relative cycle counts) are used to express time.
Instead of bumping the existing patch I chose to make this a separate
patch, merely to try and focus the discussion around a smaller set of
changes. The two patches will be pushed together though.
This changes done as part of this patch are mostly following directly
from the introduction of the wrapper class, and change enough code to
make things compile and run again. There are definitely more places
where int/uint/Tick is still used to represent cycles, and it will
take some time to chase them all down. Similarly, a lot of parameters
should be changed from Param.Tick and Param.Unsigned to
Param.Cycles.
In addition, the use of curTick is questionable as there should not be
an absolute cycle. Potential solutions can be built on top of this
patch. There is a similar situation in the o3 CPU where
lastRunningCycle is currently counting in Cycles, and is still an
absolute time. More discussion to be had in other words.
An additional change that would be appropriate in the future is to
perform a similar wrapping of Tick and probably also introduce a
Ticks class along with suitable operators for all these classes.
|
|
This patch introduces the notion of a clock update function that aims
to avoid costly divisions when turning the current tick into a
cycle. Each clocked object advances a private (hidden) cycle member
and a tick member and uses these to implement functions for getting
the tick of the next cycle, or the tick of a cycle some time in the
future.
In the different modules using the clocks, changes are made to avoid
counting in ticks only to later translate to cycles. There are a few
oddities in how the O3 and inorder CPU count idle cycles, as seen by a
few locations where a cycle is subtracted in the calculation. This is
done such that the regression does not change any stats, but should be
revisited in a future patch.
Another, much needed, change that is not done as part of this patch is
to introduce a new typedef uint64_t Cycle to be able to at least hint
at the unit of the variables counting Ticks vs Cycles. This will be
done as a follow-up patch.
As an additional follow up, the thread context still uses ticks for
the book keeping of last activate and last suspend and this should
probably also be changed into cycles as well.
|
|
This patch adds the necessary flags to the SConstruct and SConscript
files for compiling using clang 2.9 and later (on Ubuntu et al and OSX
XCode 4.2), and also cleans up a bunch of compiler warnings found by
clang. Most of the warnings are related to hidden virtual functions,
comparisons with unsigneds >= 0, and if-statements with empty
bodies. A number of mismatches between struct and class are also
fixed. clang 2.8 is not working as it has problems with class names
that occur in multiple namespaces (e.g. Statistics in
kernel_stats.hh).
clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which
causes confusion between the container std::set and the function
Packet::set, and this is currently addressed by not including the
entire namespace std, but rather selecting e.g. "using std::vector" in
the appropriate places.
|
|
handle them like we do in FS mode, by blocking the TLB until the fault
is handled by the fault->invoke()
|
|
|
|
|
|
including IPR accesses and store-conditionals. These class of instructions will not
execute correctly in a superscalar machine
|
|
use a dummy instruction to facilitate the squash after
the interrupts trap
|
|
|
|
segfault was caused by squashed multiply thats in the process of an event.
use isProcessing flag to handle this and cleanup the MDU code
|
|
once a ST is sent off, it's OK to keep processing, however it's a little more
complicated to handle the packet acknowledging the store is completed
|
|
|
|
implement a clean interface to handle branch misprediction and eventually all pipeline
flushing
|
|
clean up control flow to make it easier to understand
|
|
At the same time, rename the trace flags to debug flags since they
have broader usage than simply tracing. This means that
--trace-flags is now --debug-flags and --trace-help is now --debug-help
|
|
|
|
|
|
resources don't need to call getLatency because the latency is already a member
in the class. If there is some type of special case where different instructions
impose a different latency inside a resource then we can revisit this and
add getLatency() back in
|
|
cleanup hanging pointers and other cruft in the destructors
|
|
if a resource has a zero cycle latency (e.g. RegFile write), then dont allocate an event
for it to use
|
|
formerly, to free up bandwidth in a resource, we could just change the pointer in that resource
but at the same time the pipeline stages had visibility to see what happened to a resource request.
Now that we are recycling these requests (to avoid too much dynamic allocation), we can't throw
away the request too early or the pipeline stage gets bad information. Instead, mark when a request
is done with the resource all together and then let the pipeline stage call back to the resource
that it's time to free up the bandwidth for more instructions
*** inteface notes ***
- When an instruction completes and is done in a resource for that cycle, call done()
- When an instruction fails and is done with a resource for that cycle, call done(false)
- When an instruction completes, but isnt finished with a resource, call completed()
- When an instruction fails, but isnt finished with a resource, call completed(false)
* * *
inorder: tlbmiss wakeup bug fix
|
|
take away all instances of reqMap in the code and make all references use the built-in
request vectors inside of each resource. The request map was dynamically allocating
a request per instruction. The request vector just allocates N number of requests
during instantiation and then the surrounding code is fixed up to reuse those N requests
***
setRequest() and clearRequest() are the new accessors needed to define a new
request in a resource
|
|
this will allow us to reuse resource requests within a resource instead
of always dynamically allocating
|
|
we are going to be getting away from creating new resource requests for every
instruction so no more need to keep track of a reqRemoveList and clean it up
every tick
|
|
first change in an optimization that will stop InOrder from allocating new memory for every instruction's
request to a resource. This gets expensive since every instruction needs to access ~10 requests before
graduation. Instead, the plan is to allocate just enough resource request objects to satisfy each resource's
bandwidth (e.g. the execution unit would need to allocate 3 resource request objects for a 1-issue pipeline
since on any given cycle it could have 2 read requests and 1 write request) and then let the instructions
contend and reuse those allocated requests. The end result is a smaller memory footprint for the InOrder model
and increased simulation performance
|
|
allow the pipeline and resources to use the cached instruction schedule and resource
sked iterator
|
|
Maintain all information about an instruction's fault in the DynInst object rather
than any cpu-request object. Also, if there is a fault during the execution stage
then just save the fault inside the instruction and trap once the instruction
tries to graduate
|
|
This step makes it easy to replace the accessor functions
(which still access a global variable) with ones that access
per-thread curTick values.
|
|
There were several copies of similar functions that looked
like they all replicated reschedule(), so I replaced them
with direct calls. Keeping this separate from the previous
cset since there may be some subtle functional differences
if the code ever reschedules an event that is scheduled but
not squashed (though none were detected in the regressions).
|
|
Events need to be scheduled on the queue assigned
to the SimObject, not on the global queue (which
should be going away).
Also cleaned up a number of redundant expressions
that made the code unnecessarily verbose.
|
|
|
|
m5 doesnt do stats specific to binary and this resource request stat is probably only
useful for people who really know the ins/outs of the model anyway
|
|
|
|
also, remove inst-req stats as default.good for debugging
but in terms of pure processor stats they aren't useful
|
|
|
|
|
|
|
|
only show requests processed when the resource is actually in use
|
|
add code to recognize memory stalls in resources and the pipeline as well
as squash a thread if there is a stall and we are in the switch on cache miss
model
|
|
- cpuEventNum
- resReqCount
|
|
|
|
make sure unrecognized events in the resource pool are deleted and also delete resource events in destructor
|
|
TLB had a bug where if it was stalled and waiting , it would not squash all instructions older than squashed instruction correctly
* * *
|
|
|
|
This model currently only works in MIPS_SE mode, so it will take some effort
to clean it up and make it generally useful. Hopefully people are willing to
help make that happen!
|