gem5 - gem5

Age	Commit message (Collapse)	Author
2012-06-04	X86: Ensure that the CPUID instruction always writes its outputs.	Gabe Black
	The CPUID instruction was implemented so that it would only write its results if the instruction was successful. This works fine on the simple CPU where unwritten registers retain their old values, but on a CPU like O3 with renaming this is broken. The instruction needs to write the old values back into the registers explicitly if they aren't being changed.
2012-05-26	CPU: Merge the predecoder and decoder.	Gabe Black
	These classes are always used together, and merging them will give the ISAs more flexibility in how they cache things and manage the process. --HG-- rename : src/arch/x86/predecoder_tables.cc => src/arch/x86/decoder_tables.cc
2012-05-25	ISA: Make the decode function part of the ISA's decoder.	Gabe Black

2012-05-22	X86: Split Condition Code register	Nilay Vaish
	This patch moves the ECF and EZF bits to individual registers (ecfBit and ezfBit) and the CF and OF bits to cfofFlag registers. This is being done so as to lower the read after write dependencies on the the condition code register. Ultimately we will have the following registers [ZAPS], [OF], [CF], [ECF], [EZF] and [DF]. Note that this is only one part of the solution for lowering the dependencies. The other part will check whether or not the condition code register needs to be actually read. This would be done through a separate patch.
2012-05-19	x86 ISA: Implement the sse3 haddps instruction.	Marc Orr
	Shuffle the 32 bit values into position, and then add in parallel.
2012-04-29	X86: Fix the IMUL_R_P_I macroop.	Gabe Black
	The disp displacement was left off the load microop so the wrong value was used.
2012-04-14	clang/gcc: Fix compilation issues with clang 3.0 and gcc 4.6	Andreas Hansson
	This patch addresses a number of minor issues that cause problems when compiling with clang >= 3.0 and gcc >= 4.6. Most importantly, it avoids using the deprecated ext/hash_map and instead uses unordered_map (and similarly so for the hash_set). To make use of the new STL containers, g++ and clang has to be invoked with "-std=c++0x", and this is now added for all gcc versions >= 4.6, and for clang >= 3.0. For gcc >= 4.3 and <= 4.5 and clang <= 3.0 we use the tr1 unordered_map to avoid the deprecation warning. The addition of c++0x in turn causes a few problems, as the compiler is more stringent and adds a number of new warnings. Below, the most important issues are enumerated: 1) the use of namespaces is more strict, e.g. for isnan, and all headers opening the entire namespace std are now fixed. 2) another other issue caused by the more stringent compiler is the narrowing of the embedded python, which used to be a char array, and is now unsigned char since there were values larger than 128. 3) a particularly odd issue that arose with the new c++0x behaviour is found in range.hh, where the operator< causes gcc to complain about the template type parsing (the "<" is interpreted as the beginning of a template argument), and the problem seems to be related to the begin/end members introduced for the range-type iteration, which is a new feature in c++11. As a minor update, this patch also fixes the build flags for the clang debug target that used to be shared with gcc and incorrectly use "-ggdb".
2012-03-31	X86: Fix address size handling so real mode works properly.	Gabe Black
	Virtual (pre-segmentation) addresses are truncated based on address size, and any non-64 bit linear address is truncated to 32 bits. This means that real mode addresses aren't truncated down to 16 bits after their segment bases are added in.
2012-03-19	gcc: Clean-up of non-C++0x compliant code, first steps	Andreas Hansson
	This patch cleans up a number of minor issues aiming to get closer to compliance with the C++0x standard as interpreted by gcc and clang (compile with std=c++0x and -pedantic-errors). In particular, the patch cleans up enums where the last item was succeded by a comma, namespaces closed by a curcly brace followed by a semi-colon, and the use of the GNU-extension typeof (replaced by templated functions). It does not address variable-length arrays, zero-size arrays, anonymous structs, range expressions in switch statements, and the use of long long. The generated CPU code also has a large number of issues that remain to be fixed, mainly related to overflows in implicit constant conversion (due to shifts).
2012-03-19	clang: Fix recently introduced clang compilation errors	Andreas Hansson
	This patch makes the code compile with clang 2.9 and 3.0 again by making two very minor changes. Firt, it maintains a strict typing in the forward declaration of the BaseCPUParams. Second, it adds a FullSystemInt flag of the type unsigned int next to the boolean FullSystem flag. The FullSystemInt variable can be used in decode-statements (expands to switch statements) in the instruction decoder.
2012-02-26	X86: Use the M5PanicFault fault in execute methods instead of calling panic.	Gabe Black
	If an instruction is executed speculatively and hits a situation where it wants to panic, it should return a fault instead. If the instruction was misspeculated, the fault can be thrown away. If the instruction wasn't misspeculated, the fault will be invoked and the panic will still happen.
2012-01-16	Merge yet again with the main repository.	Gabe Black

2012-01-09	X86: Add memory fence to I/O instructions	Nilay Vaish

2012-01-07	Merge with the main repository again.	Gabe Black

2012-01-07	Merge with main repository.	Gabe Black

2011-12-01	X86: Fix a bad segmentation check for the stack segment.	Gabe Black
	--HG-- extra : rebase_source : 755f4f6eae52f88ed516a1f1ac9e2565725d89c1
2011-11-03	x86: Add microop for fence	Nilay Vaish
	This patch adds a new microop for memory barrier. The microop itself does nothing, but since it is marked as a memory barrier, the O3 CPU should flush all the pending loads and stores before the fence to the memory system.
2011-10-31	GCC: Get everything working with gcc 4.6.1.	Gabe Black
	And by "everything" I mean all the quick regressions.
2011-09-30	SE/FS: Use the new FullSystem constant where possible.	Gabe Black

2011-09-26	ISA parser: Use '_' instead of '.' to delimit type modifiers on operands.	Gabe Black
	By using an underscore, the "." is still available and can unambiguously be used to refer to members of a structure if an operand is a structure, class, etc. This change mostly just replaces the appropriate "."s with "_"s, but there were also a few places where the ISA descriptions where handling the extensions themselves and had their own regular expressions to update. The regular expressions in the isa parser were updated as well. It also now looks for one of the defined type extensions specifically after connecting "_" where before it would look for any sequence of characters after a "." following an operand name and try to use it as the extension. This helps to disambiguate cases where a "_" may legitimately be part of an operand name but not separate the name from the type suffix. Because leaving the "_" and suffix on the variable name still leaves a valid C++ identifier and all extensions need to be consistent in a given context, I considered leaving them on as a breadcrumb that would show what the intended type was for that operand. Unfortunately the operands can be referred to in code templates, the Mem operand in particular, and since the exact type of Mem can be different for different uses of the same template, that broke things.
2011-09-19	X86: Don't use "#if FULL_SYSTEM" in the X86 ISA description.	Gabe Black
	The decoder now checks the value of FULL_SYSTEM in a switch statement to decide whether to return a real syscall instruction or one that triggers syscall emulation (or a panic in FS mode). The switch statement should devolve into an if, and also should be optimized out since it's based on constant input.
2011-09-19	PseudoInst: Remove the now unnecessary #if FULL_SYSTEMs around pseudoinsts.	Gabe Black

2011-09-18	Pseudoinst: Add an initParam pseudo inst function.	Gabe Black

2011-08-13	X86: Use IsSquashAfter if an instruction could affect fetch translation.	Gabe Black
	Control register operands are set up so that writing to them is serialize after, serialize before, and non-speculative. These are probably overboard, but they should usually be safe. Unfortunately there are times when even these aren't enough. If an instruction modifies state that affects fetch, later serialized instructions which come after it might have already gone through fetch and decode by the time it commits. These instructions may have been translated incorrectly or interpretted incorrectly and need to be destroyed. This change modifies instructions which will or may have this behavior so that they use the IsSquashAfter flag when necessary.
2011-07-05	ISA parser: Define operand types with a ctype directly.	Gabe Black

2011-07-02	ISA: Use readBytes/writeBytes for all instruction level memory operations.	Gabe Black

2011-07-02	X86: Fix store microops so they don't drop faults in timing mode.	Gabe Black
	If a fault was returned by the CPU when a store initiated it's write, the store instruction would ignore the fault. This change fixes that.
2011-06-21	X86: Eliminate an unused argument for building store microops.	Gabe Black

2011-06-02	copyright: clean up copyright blocks	Nathan Binkert

2011-05-06	X86: Fix the Lldt instructions so they load the ldtr and not the tr.	Gabe Black

2011-04-23	X86: When decoding a memory only inst, fault on reg encodings, don't assert.	Gabe Black
	This change makes the decoder figure out if an instruction that only supports memory is using a register encoding and decodes directly to "Unknown" which will behave appropriately. This prevents other parts of the instruction creation process from seeing the mismatch and asserting.
2011-04-15	trace: reimplement the DTRACE function so it doesn't use a vector	Nathan Binkert
	At the same time, rename the trace flags to debug flags since they have broader usage than simply tracing. This means that --trace-flags is now --debug-flags and --trace-help is now --debug-help
2011-04-15	includes: sort all includes	Nathan Binkert

2011-03-02	X86: Decode the mysterious and elusive ffreep x87 instruction.	Gabe Black
	The internet says this instruction was created by accident when an Intel CPU failed to decode x87 instructions properly. It's been documented on a few rare occasions and has generally worked to ensure backwards compatability. One source claims that the gcc toolchain is basically the only thing that emits it, and that emulators/binary translators like qemu and bochs implement it. We won't actually implement it here since we're hardly implementing any other x87 instructions either. If we were to implement it, it would behave the same as ffree but then also pop the register stack. http://www.pagetable.com/?p=16
2011-03-01	X86: Mark IO reads and writes as non-speculative.	Gabe Black

2011-03-01	X86: Mark prefetches as such in their instruction and request flags.	Gabe Black

2011-02-15	X86: Get rid of "inline" on the MicroPanic constructor in decoder.cc.	Gabe Black
	This was making certain versions of gcc omit the function from the object file which would break the build.
2011-02-13	X86: Put the result used for flags in an intermediate variable.	Gabe Black
	Using the destination register directly causes the ISA parser to treat it as a source even if none of the original bits are used.
2011-02-13	X86: Don't read in dest regs if all bits are replaced.	Gabe Black
	In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or 64 bits wide overwrite all bits of the destination register. This change removes false dependencies in these cases where the previous value of a register doesn't need to be read to write a new value. New versions of most microops are created that have a "Big" suffix which simply overwrite their destination, and the right version to use is selected during microop allocation based on the selected data size. This does not change the performance of the O3 CPU model significantly, I assume because there are other false dependencies from the condition code bits in the flags register.
2011-02-13	X86: Define fault objects to carry debug messages.	Gabe Black
	These faults can panic/warn/warn_once, etc., instead of instructions doing that themselves directly. That way, instructions can be speculatively executed, and only if they're actually going to commit will their fault be invoked and the panic, etc., happen.
2011-02-07	X86: Use all 64 bits of the lstar register in the SYSCALL_64 macroop.	Tim Harris
	During SYSCALL_64, use dataSize=8 when handling new rip (ref http://www.intel.com/Assets/PDF/manual/253668.pdf 5.8.8 IA32_LSTAR is a 64-bit address)
2011-02-07	X86: Fix JMP_FAR_I to unpack a far pointer correctly.	Tim Harris
	JMP_FAR_I was unpacking its far pointer operand using sll instead of srl like it should, and also putting the components in the wrong registers for use by other microcode.
2011-02-07	X86: Read the LDT/GDT at CPL0 when executing an iret.	Tim Harris
	During iret access LDT/GDT at CPL0 rather than after transition to user mode (if I'm reading the Intel IA-64 architecture spec correctly, the contents of the descriptor table are read before the CPL is updated).
2011-02-06	m5: added work completed monitoring support	Brad Beckmann

2011-02-06	x86: set IsCondControl flag for the appropriate microops	Brad Beckmann

2011-02-02	X86: Get rid of the stupd microop.	Gabe Black

2011-02-02	X86: Replace the stupd microop with a store/update sequence.	Gabe Black

2010-12-08	X86: Take advantage of new PCState syntax.	Gabe Black

2010-10-31	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.	Gabe Black
	This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-10-29	X86: Fault on divide by zero instead of panicing.	Gabe Black