gem5 - gem5

Age	Commit message (Collapse)	Author
2012-06-05	O3: Clean up the O3 structures and try to pack them a bit better.	Ali Saidi
	DynInst is extremely large the hope is that this re-organization will put the most used members close to each other.
2012-03-09	O3/Ozone: Eliminate dead code counting software prefetch insts	Geoffrey Blake
	Eliminates dead code in the O3 and Ozone CPU models that counted software prefetch instructions separately for the ALPHA ISA only.
2012-03-09	CheckerCPU: Make CheckerCPU runtime selectable instead of compile selectable	Geoffrey Blake
	Enables the CheckerCPU to be selected at runtime with the --checker option from the configs/example/fs.py and configs/example/se.py configuration files. Also merges with the SE/FS changes.
2012-02-24	CPU: Round-two unifying instr/data CPU ports across models	Andreas Hansson
	This patch continues the unification of how the different CPU models create and share their instruction and data ports. Most importantly, it forces every CPU to have an instruction and a data port, and gives these ports explicit getters in the BaseCPU (getDataPort and getInstPort). The patch helps in simplifying the code, make assumptions more explicit, andfurther ease future patches related to the CPU ports. The biggest changes are in the in-order model (that was not modified in the previous unification patch), which now moves the ports from the CacheUnit to the CPU. It also distinguishes the instruction fetch and load-store unit from the rest of the resources, and avoids the use of indices and casting in favour of keeping track of these two units explicitly (since they are always there anyways). The atomic, timing and O3 model simply return references to their already existing ports.
2012-01-31	clang: Enable compiling gem5 using clang 2.9 and 3.0	Koan-Sin Tan
	This patch adds the necessary flags to the SConstruct and SConscript files for compiling using clang 2.9 and later (on Ubuntu et al and OSX XCode 4.2), and also cleans up a bunch of compiler warnings found by clang. Most of the warnings are related to hidden virtual functions, comparisons with unsigneds >= 0, and if-statements with empty bodies. A number of mismatches between struct and class are also fixed. clang 2.8 is not working as it has problems with class names that occur in multiple namespaces (e.g. Statistics in kernel_stats.hh). clang has a bug (http://llvm.org/bugs/show_bug.cgi?id=7247) which causes confusion between the container std::set and the function Packet::set, and this is currently addressed by not including the entire namespace std, but rather selecting e.g. "using std::vector" in the appropriate places.
2012-01-31	CheckerCPU: Re-factor CheckerCPU to be compatible with current gem5	Geoffrey Blake
	Brings the CheckerCPU back to life to allow FS and SE checking of the O3CPU. These changes have only been tested with the ARM ISA. Other ISAs potentially require modification.
2012-01-10	DPRINTF: Improve some dprintf messages.	Nilay Vaish

2011-08-19	O3: Squash the violator and younger instructions instead not all insts.	Giacomo Gabrielli
	Change the way instructions are squashed on memory ordering violations to squash the violator and younger instructions, not all instructions that are younger than the instruction they violated (no reason to throw away valid work).
2011-07-15	O3: Create a pipeline activity viewer for the O3 CPU model.	Giacomo Gabrielli
	Implemented a pipeline activity viewer as a python script (util/o3-pipeview.py) and modified O3 code base to support an extra trace flag (O3PipeView) for generating traces to be used as inputs by the tool.
2011-05-23	O3: Fix issue w/wbOutstading being decremented multiple times on blocked cache.	Geoffrey Blake
	If a split load fails on a blocked cache wbOutstanding can be decremented twice if the first part of the split load succeeds and the second part fails. Condition the decrementing on not having completed the first part of the load.
2011-05-13	O3: Fix an issue with a load & branch instruction and mem dep squashing	Geoffrey Blake
	Instructions that load an address and are control instructions can execute down the wrong path if they were predicted correctly and then instructions following them are squashed. If an instruction is a memory and control op use the predicted address for the next PC instead of just advancing the PC. Without this change NPC is used for the next instruction, but predPC is used to verify that the branch was successful so the wrong path is silently executed.
2011-04-19	stats: rename stats so they can be used as python expressions	Nathan Binkert

2011-04-15	trace: reimplement the DTRACE function so it doesn't use a vector	Nathan Binkert
	At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help
2011-04-15	includes: fix up code after sorting	Nathan Binkert

2011-04-15	includes: sort all includes	Nathan Binkert

2011-03-17	O3: Cleanup the commitInfo comm struct.	Ali Saidi
	Get rid of unused members and use base types rather than derrived values where possible to limit amount of state.
2011-02-23	O3: When a prefetch causes a fault, don't record it in the inst	Ali Saidi

2011-02-11	O3: Enhance data address translation by supporting hardware page table walkers.	Giacomo Gabrielli
	Some ISAs (like ARM) relies on hardware page table walkers. For those ISAs, when a TLB miss occurs, initiateTranslation() can return with NoFault but with the translation unfinished. Instructions experiencing a delayed translation due to a hardware page table walk are deferred until the translation completes and kept into the IQ. In order to keep track of them, the IQ has been augmented with a queue of the outstanding delayed memory instructions. When their translation completes, instructions are re-executed (only their initiateAccess() was already executed; their DTB translation is now skipped). The IEW stage has been modified to support such a 2-pass execution.
2011-02-06	mcpat: Adds McPAT performance counters	Joel Hestness
	Updated patches from Rick Strong's set that modify performance counters for McPAT
2011-01-18	O3: Don't test misprediction on load instructions until executed.	Matt Horsnell

2011-01-18	O3: Fix corner cases where multiple squashes/fetch redirects overwrite timebuf.	Matt Horsnell

2011-01-18	O3: Fix mispredicts from non control instructions.	Matt Horsnell
	The squash inside the fetch unit should not attempt to remove them from the branch predictor as non-control instructions are not pushed into the predictor.
2011-01-18	O3: Fixes the way prefetches are handled inside the iew unit.	Matt Horsnell
	This patch prevents the prefetch being added to the instCommit queue twice.
2011-01-18	ARM: Add support for moving predicated false dest operands from sources.	Ali Saidi

2011-01-18	O3: Fixes fetch deadlock when the interrupt clears before CPU handles it.	Min Kyu Jeong
	When this condition occurs the cpu should restart the fetch stage to fetch from the original execution path. Fault handling in the commit stage is cleaned up a little bit so the control flow is simplier. Finally, if an instruction is being used to carry a fault it isn't executed, so the fault propagates appropriately.
2011-01-03	Move sched_list.hh and timebuf.hh from src/base to src/cpu.	Steve Reinhardt
	These files really aren't general enough to belong in src/base. This patch doesn't reorder include lines, leaving them unsorted in many cases, but Nate's magic script will fix that up shortly. --HG-- rename : src/base/sched_list.hh => src/cpu/sched_list.hh rename : src/base/timebuf.hh => src/cpu/timebuf.hh
2010-12-07	O3: Support SWAP and predicated loads/store in ARM.	Min Kyu Jeong

2010-10-31	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.	Gabe Black
	This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-08-23	O3: Skipping mem-order violation check for uncachable loads.	Min Kyu Jeong
	Uncachable load is not executed until it reaches the head of the ROB, hence cannot cause one.
2010-08-23	ARM: mark msr/mrs instructions as SerializeBefore/After	Min Kyu Jeong
	Since miscellaneous registers bypass wakeup logic, force serialization to resolve data dependencies through them * * * ARM: adding non-speculative/serialize flags for instructions change CPSR
2010-08-23	O3: Handle loads when the destination is the PC.	Min Kyu Jeong
	For loads that PC is the destination, check if the load was mispredicted again when the value being loaded returns from memory
2009-09-23	arch: nuke arch/isa_specific.hh and move stuff to generated config/the_isa.hh	Nathan Binkert

2009-05-26	types: add a type for thread IDs and try to use it everywhere	Nathan Binkert

2009-04-18	o3-delay-slot-bpred: fix decode stage handling of uncdtl. branches.\n decode ↵	Korey Sewell
	stage was not setting the predicted PC correctly or passing that information back to fetch correctly
2008-08-11	params: Convert the CPU objects to use the auto generated param structs.	Nathan Binkert
	A whole bunch of stuff has been converted to use the new params stuff, but the CPU wasn't one of them. While we're at it, make some things a bit more stylish. Most of the work was done by Gabe, I just cleaned stuff up a bit more at the end.
2007-11-06	O3: Remove unneeded variable.	Gabe Black
	--HG-- extra : convert_revision : 4624ccd3f08818f4632881d6aca6d1cc343bbdcf
2007-04-14	Add support for microcode and pull out the special branch delay slot ↵	Gabe Black
	handling. Branch delay slots need to be squash on a mispredict as well because the nnpc they saw was incorrect. --HG-- extra : convert_revision : 8b9c603616bcad254417a7a3fa3edfb4c8728719
2007-04-13	Remove most of the special handling for delay slots since they have to be ↵	Gabe Black
	squashed anyway on a mispredict. This is because the NNPC value they saw when executing was incorrect. --HG-- extra : convert_revision : b42c4eb28b4fbba66c65cbd0a5033bf886c1532d
2007-04-04	Pass ISA-specific O3 CPU as a constructor parameter instead of using setCPU ↵	Kevin Lim
	functions. src/cpu/o3/alpha/cpu_impl.hh: Pass ISA-specific O3 CPU to FullO3CPU as a constructor parameter instead of using setCPU functions. --HG-- extra : convert_revision : 74f4b1f5fb6f95a56081f367cce7ff44acb5688a
2007-04-02	Remove/comment out DPRINTFs that were causing a segfault.	Kevin Lim
	The removed ones were unnecessary. The commented out ones could be useful in the future, should this problem get fixed. See flyspray task #243. src/cpu/o3/commit_impl.hh: src/cpu/o3/decode_impl.hh: src/cpu/o3/fetch_impl.hh: src/cpu/o3/iew_impl.hh: src/cpu/o3/inst_queue_impl.hh: src/cpu/o3/lsq_impl.hh: src/cpu/o3/lsq_unit_impl.hh: src/cpu/o3/rename_impl.hh: src/cpu/o3/rob_impl.hh: Remove/comment out DPRINTFs that were causing a segfault. --HG-- extra : convert_revision : b5aeda1c6300dfde5e0a3e9b8c4c5f6fa00b9862
2007-03-23	Handle status bits a little better, as well as non-speculative instructions.	Kevin Lim
	src/cpu/o3/iew_impl.hh: Allow for slightly more flexible handling of non-speculative instructions. They can be other classes now, such as loads or stores. Also be sure to clear the state associated with squashes that are not used. i.e. if a squash due to a memory ordering violation happens on the same cycle as an older branch squashing, clear the state associated with the memory ordering violation. Lastly don't consider uncached loads to officially be "at commit" until IEW receives the signal back from commit about the load. src/cpu/o3/inst_queue_impl.hh: Don't consider non-speculative instructions to be "at commit" until the IQ has received a signal from commit about the instruction. This prevents non-speculative instructions from being issued too early. src/cpu/o3/mem_dep_unit_impl.hh: Clear instruction's ability to issue if it's replayed. --HG-- extra : convert_revision : d69dae878a30821222885485f4dee87170d56eb3
2007-01-03	Merge zizzer:/bk/newmem	Gabe Black
	into zower.eecs.umich.edu:/eecshome/m5/newmem --HG-- extra : convert_revision : f4a05accb8fa24d425dd818b1b7f268378180e99
2006-12-28	Fixes to get non-delay slot ISAs (Alpha) working again, and pulling some ↵	Gabe Black
	debug output out of ifdefs. --HG-- extra : convert_revision : 29d0969e2d3e809aac32262ba20907e6e4ef1a42
2006-12-26	Remove some #if FULL_SYSTEMs so MP stuff works even in SE mode.	Kevin Lim
	--HG-- extra : convert_revision : 5c334ec806305451b3883c7fd0ed9cd695c038bc
2006-12-20	don't use (*activeThreads).begin(), use activeThreads->blah().	Nathan Binkert
	Also don't call (*activeThreads).end() over and over. Just call activeThreads->end() once and save the result. Make sure we always check that there are elements in the list before we grab the first one. --HG-- extra : convert_revision : d769d8ed52da99532d57a9bbc93e92ddf22b7e58
2006-12-18	Fix a place where the wrong width parameter was used, and set the nextNPC ↵	Gabe Black
	correctly on memory squashes. --HG-- extra : convert_revision : 7914a48ea953607c48f93984e3b043098f0d7c62
2006-12-16	Merge zizzer:/bk/newmem	Gabe Black
	into zower.eecs.umich.edu:/eecshome/m5/newmem src/arch/isa_parser.py: src/arch/sparc/isa/formats/mem/basicmem.isa: src/arch/sparc/isa/formats/mem/blockmem.isa: src/arch/sparc/isa/formats/mem/util.isa: src/arch/sparc/miscregfile.cc: src/arch/sparc/miscregfile.hh: src/cpu/o3/iew_impl.hh: Hand Merge --HG-- extra : convert_revision : ae1b25cde85ab8ec275a09d554acd372887d4d47
2006-12-16	Made branch delay slots get squashed, and passed back an NPC and NNPC to ↵	Gabe Black
	start fetching from. --HG-- extra : convert_revision : a2e4845fedf113b5a2fd92d3d28ce5b006278103
2006-12-12	Allow for multiple redirects to happen on a single cycle (only the one for ↵	Kevin Lim
	the oldest instruction is passed on to commit). This fixes a minor bug when multiple FU completions come back out of order (due to the order in which the FUs are freed up), and the oldest redirect isn't recorded properly. The eon benchmark should run now. src/cpu/o3/iew_impl.hh: Allow for multiple redirects to happen on a single cycle (only the one for the oldest instruction is passed on to commit). --HG-- extra : convert_revision : b7d202dee1754539ed814f0fac59adb8c6328ee1
2006-12-06	Change how optional delay slot instructions are detected and squashed.	Gabe Black
	--HG-- extra : convert_revision : ffd019d4adc2fbbc0a663d8dc6ef73edce12511b