gem5 - gem5

Age	Commit message (Collapse)	Author
2011-02-18	m5: merge inorder/release-notes/make_release changes	Korey Sewell

2011-02-18	inorder: regr-update: reduce dynamic mem. use to speedup sims	Korey Sewell
	previous changesets took a closer look at memory mgmt in the inorder model and sought to avoid dynamic memory mgmt (for access to pipeline resources) as much as possible. For the regressions that were run, the sims are about 2x speedup from changeset 7726 which is the last change since the recent commits in Feb. (note: these regressions now are 4-issue CPUs instead of just 1-issue)
2011-02-18	inorder: add names and slot #s to res. dprints	Korey Sewell

2011-02-18	inorder: ignore nops in execution unit	Korey Sewell

2011-02-18	inorder: update graduation unit	Korey Sewell
	make sure instructions are able to commit before writing back to the RF do not commit more than 1 non-speculative instruction per cycle
2011-02-18	inorder: recognize isSerializeAfter flag	Korey Sewell
	keep track of when an instruction needs the execution behind it to be serialized. Without this, in SE Mode instructions can execute behind a system call exit().
2011-02-18	inorder: update default thread size(=1)	Korey Sewell
	a lot of structures get allocated based off that MaxThreads parameter so this is an effort to not abuse it
2011-02-18	inorder: don't overuse getLatency()	Korey Sewell
	resources don't need to call getLatency because the latency is already a member in the class. If there is some type of special case where different instructions impose a different latency inside a resource then we can revisit this and add getLatency() back in
2011-02-18	inorder: update max. resource bandwidths	Korey Sewell
	each resource has a certain # of requests it can take per cycle. update the #s here to be more realistic based off of the pipeline width and if the resource needs to be accessed on multiple cycles
2011-02-18	inorder: cleanup in destructors	Korey Sewell
	cleanup hanging pointers and other cruft in the destructors
2011-02-18	inorder: fix cache/fetch unit memory leaks	Korey Sewell
	--- need to delete the cache request's data on clearRequest() now that we are recycling requests --- fetch unit needs to deallocate the fetch buffer blocks when they are replaced or squashed.
2011-02-18	inorder: remove events for zero-cycle resources	Korey Sewell
	if a resource has a zero cycle latency (e.g. RegFile write), then dont allocate an event for it to use
2011-02-18	inorder: update pipeline interface for handling finished resource reqs	Korey Sewell
	formerly, to free up bandwidth in a resource, we could just change the pointer in that resource but at the same time the pipeline stages had visibility to see what happened to a resource request. Now that we are recycling these requests (to avoid too much dynamic allocation), we can't throw away the request too early or the pipeline stage gets bad information. Instead, mark when a request is done with the resource all together and then let the pipeline stage call back to the resource that it's time to free up the bandwidth for more instructions * inteface notes * - When an instruction completes and is done in a resource for that cycle, call done() - When an instruction fails and is done with a resource for that cycle, call done(false) - When an instruction completes, but isnt finished with a resource, call completed() - When an instruction fails, but isnt finished with a resource, call completed(false) * * * inorder: tlbmiss wakeup bug fix
2011-02-18	inorder: remove request map, use request vector	Korey Sewell
	take away all instances of reqMap in the code and make all references use the built-in request vectors inside of each resource. The request map was dynamically allocating a request per instruction. The request vector just allocates N number of requests during instantiation and then the surrounding code is fixed up to reuse those N requests *** setRequest() and clearRequest() are the new accessors needed to define a new request in a resource
2011-02-18	inorder: add valid bit for resource requests	Korey Sewell
	this will allow us to reuse resource requests within a resource instead of always dynamically allocating
2011-02-18	inorder: remove reqRemoveList	Korey Sewell
	we are going to be getting away from creating new resource requests for every instruction so no more need to keep track of a reqRemoveList and clean it up every tick
2011-02-18	inorder: initialize res. req. vectors based on resource bandwidth	Korey Sewell
	first change in an optimization that will stop InOrder from allocating new memory for every instruction's request to a resource. This gets expensive since every instruction needs to access ~10 requests before graduation. Instead, the plan is to allocate just enough resource request objects to satisfy each resource's bandwidth (e.g. the execution unit would need to allocate 3 resource request objects for a 1-issue pipeline since on any given cycle it could have 2 read requests and 1 write request) and then let the instructions contend and reuse those allocated requests. The end result is a smaller memory footprint for the InOrder model and increased simulation performance
2011-02-16	merge alpha system files into tree	Nathan Binkert

2011-02-15	Util: Get rid of the make_release.py script.	Gabe Black
	Since we're not doing releases any more we don't really need this script. If we need it in the future, we can resurrect it from the history.
2011-02-16	Cleanup system directory to fit into modern M5 tree	Nathan Binkert

2011-02-16	copyright: update copyright on alpha system files	Nathan Binkert

2011-02-15	X86: Get rid of "inline" on the MicroPanic constructor in decoder.cc.	Gabe Black
	This was making certain versions of gcc omit the function from the object file which would break the build.
2011-02-14	Info: Clean up some info files.	Gabe Black
	Get rid of RELEASE_NOTES since we no longer do releases, update some of the information in README, and update the date in LICENSE.
2011-02-14	Ruby: Improve Change PerfectSwitch's wakeup function	Nilay Vaish
	Currently the wakeup function for the PerfectSwitch contains three loops - loop on number of virtual networks loop on number of incoming links loop till all messages for this (link, network) have been routed With an 8 processor mesh network and Hammer protocol, about 11-12% of the was observed to have been spent in this function, which is the highest amongst all the functions. It was found that the innermost loop is executed about 45 times per invocation of the wakeup function, when each invocation of the wakeup function processes just about one message. The patch tries to do away with the redundant executions of the innermost loop. Counters have been added for each virtual network that record the number of messages that need to be routed for that virtual network. The inner loops are only executed when the number of messages for that particular virtual network > 0. This does away with almost 80% of the executions of the innermost loop. The function now consumes about 5-6% of the total execution time.
2011-02-13	X86: Update stats for the improved branch detection/prediction.	Gabe Black

2011-02-13	X86: Detect branches taking into account instruction size.	Gabe Black
	The size of the current instruction determines what the npc should be if there's no branching.
2011-02-13	X86: Update stats now that the dest reg isn't read unnecessarily to set flags.	Gabe Black

2011-02-13	X86: Put the result used for flags in an intermediate variable.	Gabe Black
	Using the destination register directly causes the ISA parser to treat it as a source even if none of the original bits are used.
2011-02-13	X86: Update stats for the reduced register reads.	Gabe Black

2011-02-13	X86: Don't read in dest regs if all bits are replaced.	Gabe Black
	In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or 64 bits wide overwrite all bits of the destination register. This change removes false dependencies in these cases where the previous value of a register doesn't need to be read to write a new value. New versions of most microops are created that have a "Big" suffix which simply overwrite their destination, and the right version to use is selected during microop allocation based on the selected data size. This does not change the performance of the O3 CPU model significantly, I assume because there are other false dependencies from the condition code bits in the flags register.
2011-02-13	X86: On a bad microopc, return a microop that returns a fault that panics.	Gabe Black
	This way a bad micropc will have to get all the way to commit before killing the simulation. This accounts for misspeculated branches.
2011-02-13	X86: Define fault objects to carry debug messages.	Gabe Black
	These faults can panic/warn/warn_once, etc., instead of instructions doing that themselves directly. That way, instructions can be speculatively executed, and only if they're actually going to commit will their fault be invoked and the panic, etc., happen.
2011-02-13	X86: Only reset npc to reflect instruction length once.	Gabe Black
	When redirecting fetch to handle branches, the npc of the current pc state needs to be left alone. This change makes the pc state record whether or not the npc already reflects a real value by making it keep track of the current instruction size, or if no size has been set.
2011-02-13	O3: Fetch from the microcode ROM when needed.	Gabe Black

2011-02-13	O3: Fix GCC 4.2.4 complaint	Ali Saidi

2011-02-12	Ruby: Reorder Cache Lookup in Protocol Files	Nilay Vaish
	The patch changes the order in which L1 dcache and icache are looked up when a request comes in. Earlier, if a request came in for instruction fetch, the dcache was looked up before the icache, to correctly handle self-modifying code. But, in the common case, dcache is going to report a miss and the subsequent icache lookup is going to report a hit. Given the invariant - caches under the same controller keep track of disjoint sets of cache blocks, we can move the icache lookup before the dcache lookup. In case of a hit in the icache, using our invariant, we know that the dcache would have reported a miss. In case of a miss in the icache, we know that icache would have missed even if the dcache was looked up before looking up the icache. Effectively, we are doing the same thing as before, though in the common case, we expect reduction in the number of lookups. This was empirically confirmed for MOESI hammer. The ratio lookups to access requests is now about 1.1 to 1.
2011-02-12	inorder:regress: host-inst-rate improved ~58%	Korey Sewell
	there are still only a few inorder benchmark but for the lengthier benchmarks (twolf and vortext) the latest changes to how instruction scheduling (how instructions figure out what they want to do on each pipeline stage in the inorder model) were able to improve performance by a nice amount... The latest results for the inorder model process about 100k insts/second (note: 58% is over the last time run on 64-bit pool machines at UM)
2011-02-12	inorder: clean up the old way of inst. scheduling	Korey Sewell
	remove remnants of old way of instruction scheduling which dynamically allocated a new resource schedule for every instruction
2011-02-12	inorder: utilize cached skeds in pipeline	Korey Sewell
	allow the pipeline and resources to use the cached instruction schedule and resource sked iterator
2011-02-12	inorder: define iterator for resource schedules	Korey Sewell
	resource skeds are divided into two parts: front end (all insts) and back end (inst. specific) each of those are implemented as separate lists, so this iterator wraps around the traditional list iterator so that an instruction can walk it's schedule but seamlessly transfer from front end to back end when necessary
2011-02-12	inorder: stage scheduler for front/back end schedule creation	Korey Sewell
	add a stage scheduler class to replace InstStage in pipeline_traits.cc use that class to define a default front-end, resource schedule that all instructions will follow. This will also replace the back end schedule in pipeline_traits.cc. The reason for adding this is so that we can cache instruction schedules in the future instead of calling the same function over/over again as well as constantly dynamically alllocating memory on every instruction to try to figure out it's schedule
2011-02-12	inorder: cache instruction schedules	Korey Sewell
	first step in a optimization to not dynamically allocate an instruction schedule for every instruction but rather used cached schedules
2011-02-12	inorder: comments for resource sked class	Korey Sewell

2011-02-12	inorder: remove unused file	Korey Sewell
	inst_buffer file isn't used , so remove it
2011-02-12	inorder: remove unused isa ops	Korey Sewell
	pass/fail ops were used for testing but arent part of isa
2011-02-11	Stats: Update the statistics for vnc patch.	Ali Saidi

2011-02-11	VNC/ARM: Use VNC server and add support to boot into X11	Ali Saidi

2011-02-11	VNC: Add VNC server to M5	Ali Saidi

2011-02-11	Serialization: Allow serialization of stl lists	Ali Saidi

2011-02-11	O3: Fix pipeline restart when a table walk completes in the fetch stage.	Giacomo Gabrielli
	When a table walk is initiated by the fetch stage, the CPU can potentially move to the idle state and never wake up. The fetch stage must call cpu->wakeCPU() when a translation completes (in finishTranslation()).