gem5 - gem5

Age	Commit message (Collapse)	Author
2011-02-15	X86: Get rid of "inline" on the MicroPanic constructor in decoder.cc.	Gabe Black
	This was making certain versions of gcc omit the function from the object file which would break the build.
2011-02-14	Info: Clean up some info files.	Gabe Black
	Get rid of RELEASE_NOTES since we no longer do releases, update some of the information in README, and update the date in LICENSE.
2011-02-14	Ruby: Improve Change PerfectSwitch's wakeup function	Nilay Vaish
	Currently the wakeup function for the PerfectSwitch contains three loops - loop on number of virtual networks loop on number of incoming links loop till all messages for this (link, network) have been routed With an 8 processor mesh network and Hammer protocol, about 11-12% of the was observed to have been spent in this function, which is the highest amongst all the functions. It was found that the innermost loop is executed about 45 times per invocation of the wakeup function, when each invocation of the wakeup function processes just about one message. The patch tries to do away with the redundant executions of the innermost loop. Counters have been added for each virtual network that record the number of messages that need to be routed for that virtual network. The inner loops are only executed when the number of messages for that particular virtual network > 0. This does away with almost 80% of the executions of the innermost loop. The function now consumes about 5-6% of the total execution time.
2011-02-13	X86: Update stats for the improved branch detection/prediction.	Gabe Black

2011-02-13	X86: Detect branches taking into account instruction size.	Gabe Black
	The size of the current instruction determines what the npc should be if there's no branching.
2011-02-13	X86: Update stats now that the dest reg isn't read unnecessarily to set flags.	Gabe Black

2011-02-13	X86: Put the result used for flags in an intermediate variable.	Gabe Black
	Using the destination register directly causes the ISA parser to treat it as a source even if none of the original bits are used.
2011-02-13	X86: Update stats for the reduced register reads.	Gabe Black

2011-02-13	X86: Don't read in dest regs if all bits are replaced.	Gabe Black
	In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or 64 bits wide overwrite all bits of the destination register. This change removes false dependencies in these cases where the previous value of a register doesn't need to be read to write a new value. New versions of most microops are created that have a "Big" suffix which simply overwrite their destination, and the right version to use is selected during microop allocation based on the selected data size. This does not change the performance of the O3 CPU model significantly, I assume because there are other false dependencies from the condition code bits in the flags register.
2011-02-13	X86: On a bad microopc, return a microop that returns a fault that panics.	Gabe Black
	This way a bad micropc will have to get all the way to commit before killing the simulation. This accounts for misspeculated branches.
2011-02-13	X86: Define fault objects to carry debug messages.	Gabe Black
	These faults can panic/warn/warn_once, etc., instead of instructions doing that themselves directly. That way, instructions can be speculatively executed, and only if they're actually going to commit will their fault be invoked and the panic, etc., happen.
2011-02-13	X86: Only reset npc to reflect instruction length once.	Gabe Black
	When redirecting fetch to handle branches, the npc of the current pc state needs to be left alone. This change makes the pc state record whether or not the npc already reflects a real value by making it keep track of the current instruction size, or if no size has been set.
2011-02-13	O3: Fetch from the microcode ROM when needed.	Gabe Black

2011-02-13	O3: Fix GCC 4.2.4 complaint	Ali Saidi

2011-02-12	Ruby: Reorder Cache Lookup in Protocol Files	Nilay Vaish
	The patch changes the order in which L1 dcache and icache are looked up when a request comes in. Earlier, if a request came in for instruction fetch, the dcache was looked up before the icache, to correctly handle self-modifying code. But, in the common case, dcache is going to report a miss and the subsequent icache lookup is going to report a hit. Given the invariant - caches under the same controller keep track of disjoint sets of cache blocks, we can move the icache lookup before the dcache lookup. In case of a hit in the icache, using our invariant, we know that the dcache would have reported a miss. In case of a miss in the icache, we know that icache would have missed even if the dcache was looked up before looking up the icache. Effectively, we are doing the same thing as before, though in the common case, we expect reduction in the number of lookups. This was empirically confirmed for MOESI hammer. The ratio lookups to access requests is now about 1.1 to 1.
2011-02-12	inorder:regress: host-inst-rate improved ~58%	Korey Sewell
	there are still only a few inorder benchmark but for the lengthier benchmarks (twolf and vortext) the latest changes to how instruction scheduling (how instructions figure out what they want to do on each pipeline stage in the inorder model) were able to improve performance by a nice amount... The latest results for the inorder model process about 100k insts/second (note: 58% is over the last time run on 64-bit pool machines at UM)
2011-02-12	inorder: clean up the old way of inst. scheduling	Korey Sewell
	remove remnants of old way of instruction scheduling which dynamically allocated a new resource schedule for every instruction
2011-02-12	inorder: utilize cached skeds in pipeline	Korey Sewell
	allow the pipeline and resources to use the cached instruction schedule and resource sked iterator
2011-02-12	inorder: define iterator for resource schedules	Korey Sewell
	resource skeds are divided into two parts: front end (all insts) and back end (inst. specific) each of those are implemented as separate lists, so this iterator wraps around the traditional list iterator so that an instruction can walk it's schedule but seamlessly transfer from front end to back end when necessary
2011-02-12	inorder: stage scheduler for front/back end schedule creation	Korey Sewell
	add a stage scheduler class to replace InstStage in pipeline_traits.cc use that class to define a default front-end, resource schedule that all instructions will follow. This will also replace the back end schedule in pipeline_traits.cc. The reason for adding this is so that we can cache instruction schedules in the future instead of calling the same function over/over again as well as constantly dynamically alllocating memory on every instruction to try to figure out it's schedule
2011-02-12	inorder: cache instruction schedules	Korey Sewell
	first step in a optimization to not dynamically allocate an instruction schedule for every instruction but rather used cached schedules
2011-02-12	inorder: comments for resource sked class	Korey Sewell

2011-02-12	inorder: remove unused file	Korey Sewell
	inst_buffer file isn't used , so remove it
2011-02-12	inorder: remove unused isa ops	Korey Sewell
	pass/fail ops were used for testing but arent part of isa
2011-02-11	Stats: Update the statistics for vnc patch.	Ali Saidi

2011-02-11	VNC/ARM: Use VNC server and add support to boot into X11	Ali Saidi

2011-02-11	VNC: Add VNC server to M5	Ali Saidi

2011-02-11	Serialization: Allow serialization of stl lists	Ali Saidi

2011-02-11	O3: Fix pipeline restart when a table walk completes in the fetch stage.	Giacomo Gabrielli
	When a table walk is initiated by the fetch stage, the CPU can potentially move to the idle state and never wake up. The fetch stage must call cpu->wakeCPU() when a translation completes (in finishTranslation()).
2011-02-11	O3: Fix a few bugs in the TableWalker object.	Giacomo Gabrielli
	Uncacheable requests were set as such only in atomic mode. currState->delayed is checked in place of currState->timing for resetting currState in atomic mode.
2011-02-11	SimpleCPU: Fix a case where a DTLB fault redirects fetch and an I-side walk ↵	Ali Saidi
	occurs. This change fixes an issue where a DTLB fault occurs and redirects fetch to handle the fault and the ITLB requires a walk which delays translation. In this case the status of the cpu isn't updated appropriately, and an additional instruction fetch occurs. Eventually this hits an assert as multiple instruction fetches are occuring in the system and when the second one returns the processor is in the wrong state. Some asserts below are removed because it was always true (typo) and the state after the initiateAcc() the processor could be in any valid state when a d-side fault occurs.
2011-02-11	O3: Enhance data address translation by supporting hardware page table walkers.	Giacomo Gabrielli
	Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished. Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution.
2011-02-11	ARM: Fix timer calculations.	Ali Saidi
	The timer calculations were a bit off so time would run faster than it otherwise should
2011-02-11	Timesync: Make sure timesync event is setup after curTick is unserialized	Ali Saidi
	Setup initial timesync event in initState or loadState so that curTick has been updated to the new value, otherwise the event is scheduled in the past.
2011-02-09	Ext: Add X11 keysym header files to ext directory.	Ali Saidi

2011-02-09	ruby: removed duplicate make response call	Brad Beckmann

2011-02-08	regess: protocol regression tester updates	Brad Beckmann

2011-02-08	memtest: due to contention increase, increased deadlock threshold	Brad Beckmann

2011-02-08	config: fixed minor bug connecting dma devices to ruby	Brad Beckmann

2011-02-08	MESI CMP: Unset TBE pointer in L2 cache controller	Nilay Vaish
	The TBE pointer in the MESI CMP implementation was not being set to NULL when the TBE is deallocated. This resulted in segmentation fault on testing the protocol when the ProtocolTrace was switched on.
2011-02-07	Stats: Re update stats.	Gabe Black

2011-02-07	Stats: Back out broken update.	Gabe Black

2011-02-07	X86: Obey the wp bit of CR0.	Tim Harris
	If cr0.wp ("write protect" bit) is clear then do not generate page faults when writing to write-protected pages in kernel mode.
2011-02-07	X86: Use all 64 bits of the lstar register in the SYSCALL_64 macroop.	Tim Harris
	During SYSCALL_64, use dataSize=8 when handling new rip (ref http://www.intel.com/Assets/PDF/manual/253668.pdf 5.8.8 IA32_LSTAR is a 64-bit address)
2011-02-07	X86: Fix JMP_FAR_I to unpack a far pointer correctly.	Tim Harris
	JMP_FAR_I was unpacking its far pointer operand using sll instead of srl like it should, and also putting the components in the wrong registers for use by other microcode.
2011-02-07	X86: Read the LDT/GDT at CPL0 when executing an iret.	Tim Harris
	During iret access LDT/GDT at CPL0 rather than after transition to user mode (if I'm reading the Intel IA-64 architecture spec correctly, the contents of the descriptor table are read before the CPL is updated).
2011-02-07	Orion: Replace printf() with fatal()	Nilay Vaish
	The code for Orion 2.0 makes use of printf() at several places where there as an error in configuration of the model. These have been replaced with fatal().
2011-02-07	ruby: add stdio header in SRAM.hh	Korey Sewell
	missing header file caused RUBY_FS to not compile
2011-02-07	X86: Add stats for the new x86 fs regressions.	Gabe Black

2011-02-07	X86: Add scripts to support X86 FS configurations in the regressions.	Gabe Black