gem5 - gem5

Age	Commit message (Collapse)	Author
2016-02-06	x86: revamp cmpxchg8b/cmpxchg16b implementation	Alexandru Dutu
	The previous implementation did a pair of nested RMW operations, which isn't compatible with the way that locked RMW operations are implemented in the cache models. It was convenient though in that it didn't require any new micro-ops, and supported cmpxchg16b using 64-bit memory ops. It also worked in AtomicSimpleCPU where atomicity was guaranteed by the core and not by the memory system. It did not work with timing CPU models though. This new implementation defines new 'split' load and store micro-ops which allow a single memory operation to use a pair of registers as the source or destination, then uses a single ldsplit/stsplit RMW pair to implement cmpxchg. This patch requires support for 128-bit memory accesses in the ISA (added via a separate patch) to support cmpxchg16b.
2016-02-06	arch, x86: add support for arrays as memory operands	Steve Reinhardt
	Although the cache models support wider accesses, the ISA descriptions assume that (for the most part) memory operands are integer types, which makes it difficult to define instructions that do memory accesses larger than 64 bits. This patch adds some generic support for memory operands that are arrays of uint64_t, and specifically a 'u2qw' operand type for x86 that is an array of 2 uint64_ts (128 bits). This support is unused at this point, but will be needed shortly for cmpxchg16b. Ideally the 128-bit SSE memory accesses will also be rewritten to use this support. Support for 128-bit accesses could also have been added using the gcc __int128_t extension, which would have been less disruptive. However, although clang also supports __int128_t, it's still non-standard. Also, more importantly, this approach creates a path to defining 256- and 512-byte operands as well, which will be useful for eventual AVX support.
2013-10-15	arch/x86: add support for explicit CC register file	Yasuko Eckert
	Convert condition code registers from being specialized ("pseudo") integer registers to using the recently added CC register class. Nilay Vaish also contributed to this patch.
2013-03-11	x86: implement some of the x87 instructions	Nilay Vaish
	This patch implements ftan, fprem, fyl2x, fld* floating-point instructions.
2013-01-15	x86: implements emms instruction	Nilay Vaish

2013-01-15	x86: implement fabs, fchs instructions	Nilay Vaish

2012-09-11	X86: make use of register predication	Nilay Vaish
	The patch introduces two predicates for condition code registers -- one tests if a register needs to be read, the other tests whether a register needs to be written to. These predicates are evaluated twice -- during construction of the microop and during its execution. Register reads and writes are elided depending on how the predicates evaluate.
2012-09-11	x86: Add a separate register for D flag bit	Nilay Vaish
	The D flag bit is part of the cc flag bit register currently. But since it is not being used any where in the implementation, it creates an unnecessary dependency. Hence, it is being moved to a separate register.
2012-05-22	X86: Split Condition Code register	Nilay Vaish
	This patch moves the ECF and EZF bits to individual registers (ecfBit and ezfBit) and the CF and OF bits to cfofFlag registers. This is being done so as to lower the read after write dependencies on the the condition code register. Ultimately we will have the following registers [ZAPS], [OF], [CF], [ECF], [EZF] and [DF]. Note that this is only one part of the solution for lowering the dependencies. The other part will check whether or not the condition code register needs to be actually read. This would be done through a separate patch.
2011-08-13	X86: Use IsSquashAfter if an instruction could affect fetch translation.	Gabe Black
	Control register operands are set up so that writing to them is serialize after, serialize before, and non-speculative. These are probably overboard, but they should usually be safe. Unfortunately there are times when even these aren't enough. If an instruction modifies state that affects fetch, later serialized instructions which come after it might have already gone through fetch and decode by the time it commits. These instructions may have been translated incorrectly or interpretted incorrectly and need to be destroyed. This change modifies instructions which will or may have this behavior so that they use the IsSquashAfter flag when necessary.
2011-07-05	ISA parser: Define operand types with a ctype directly.	Gabe Black

2010-12-08	X86: Take advantage of new PCState syntax.	Gabe Black

2010-10-31	ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.	Gabe Black
	This change is a low level and pervasive reorganization of how PCs are managed in M5. Back when Alpha was the only ISA, there were only 2 PCs to worry about, the PC and the NPC, and the lsb of the PC signaled whether or not you were in PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next micropc, x86 and ARM introduced variable length instruction sets, and ARM started to keep track of mode bits in the PC. Each CPU model handled PCs in its own custom way that needed to be updated individually to handle the new dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack, the complexity could be hidden in the ISA at the ISA implementation's expense. Areas like the branch predictor hadn't been updated to handle branch delay slots or micropcs, and it turns out that had introduced a significant (10s of percent) performance bug in SPARC and to a lesser extend MIPS. Rather than perpetuate the problem by reworking O3 again to handle the PC features needed by x86, this change was introduced to rework PC handling in a more modular, transparent, and hopefully efficient way. PC type: Rather than having the superset of all possible elements of PC state declared in each of the CPU models, each ISA defines its own PCState type which has exactly the elements it needs. A cross product of canned PCState classes are defined in the new "generic" ISA directory for ISAs with/without delay slots and microcode. These are either typedef-ed or subclassed by each ISA. To read or write this structure through a Context, you use the new pcState() accessor which reads or writes depending on whether it has an argument. If you just want the address of the current or next instruction or the current micro PC, you can get those through read-only accessors on either the PCState type or the Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the move away from readPC. That name is ambiguous since it's not clear whether or not it should be the actual address to fetch from, or if it should have extra bits in it like the PAL mode bit. Each class is free to define its own functions to get at whatever values it needs however it needs to to be used in ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the PC and into a separate field like ARM. These types can be reset to a particular pc (where npc = pc + sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as appropriate), printed, serialized, and compared. There is a branching() function which encapsulates code in the CPU models that checked if an instruction branched or not. Exactly what that means in the context of branch delay slots which can skip an instruction when not taken is ambiguous, and ideally this function and its uses can be eliminated. PCStates also generally know how to advance themselves in various ways depending on if they point at an instruction, a microop, or the last microop of a macroop. More on that later. Ideally, accessing all the PCs at once when setting them will improve performance of M5 even though more data needs to be moved around. This is because often all the PCs need to be manipulated together, and by getting them all at once you avoid multiple function calls. Also, the PCs of a particular thread will have spatial locality in the cache. Previously they were grouped by element in arrays which spread out accesses. Advancing the PC: The PCs were previously managed entirely by the CPU which had to know about PC semantics, try to figure out which dimension to increment the PC in, what to set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction with the PC type itself. Because most of the information about how to increment the PC (mainly what type of instruction it refers to) is contained in the instruction object, a new advancePC virtual function was added to the StaticInst class. Subclasses provide an implementation that moves around the right element of the PC with a minimal amount of decision making. In ISAs like Alpha, the instructions always simply assign NPC to PC without having to worry about micropcs, nnpcs, etc. The added cost of a virtual function call should be outweighed by not having to figure out as much about what to do with the PCs and mucking around with the extra elements. One drawback of making the StaticInsts advance the PC is that you have to actually have one to advance the PC. This would, superficially, seem to require decoding an instruction before fetch could advance. This is, as far as I can tell, realistic. fetch would advance through memory addresses, not PCs, perhaps predicting new memory addresses using existing ones. More sophisticated decisions about control flow would be made later on, after the instruction was decoded, and handed back to fetch. If branching needs to happen, some amount of decoding needs to happen to see that it's a branch, what the target is, etc. This could get a little more complicated if that gets done by the predecoder, but I'm choosing to ignore that for now. Variable length instructions: To handle variable length instructions in x86 and ARM, the predecoder now takes in the current PC by reference to the getExtMachInst function. It can modify the PC however it needs to (by setting NPC to be the PC + instruction length, for instance). This could be improved since the CPU doesn't know if the PC was modified and always has to write it back. ISA parser: To support the new API, all PC related operand types were removed from the parser and replaced with a PCState type. There are two warts on this implementation. First, as with all the other operand types, the PCState still has to have a valid operand type even though it doesn't use it. Second, using syntax like PCS.npc(target) doesn't work for two reasons, this looks like the syntax for operand type overriding, and the parser can't figure out if you're reading or writing. Instructions that use the PCS operand (which I've consistently called it) need to first read it into a local variable, manipulate it, and then write it back out. Return address stack: The return address stack needed a little extra help because, in the presence of branch delay slots, it has to merge together elements of the return PC and the call PC. To handle that, a buildRetPC utility function was added. There are basically only two versions in all the ISAs, but it didn't seem short enough to put into the generic ISA directory. Also, the branch predictor code in O3 and InOrder were adjusted so that they always store the PC of the actual call instruction in the RAS, not the next PC. If the call instruction is a microop, the next PC refers to the next microop in the same macroop which is probably not desirable. The buildRetPC function advances the PC intelligently to the next macroop (in an ISA specific way) so that that case works. Change in stats: There were no change in stats except in MIPS and SPARC in the O3 model. MIPS runs in about 9% fewer ticks. SPARC runs with 30%-50% fewer ticks, which could likely be improved further by setting call/return instruction flags and taking advantage of the RAS. TODO: Add != operators to the PCState classes, defined trivially to be !(a==b). Smooth out places where PCs are split apart, passed around, and put back together later. I think this might happen in SPARC's fault code. Add ISA specific constructors that allow setting PC elements without calling a bunch of accessors. Try to eliminate the need for the branching() function. Factor out Alpha's PAL mode pc bit into a separate flag field, and eliminate places where it's blindly masked out or tested in the PC.
2010-05-23	copyright: Change HP copyright on x86 code to be more friendly	Nathan Binkert

2009-08-07	X86: Implement shift right/left double microops.	Gabe Black
	This is my best guess as far as what these should do. Other existing microops use implicit registers, mul1s and mul1u for instance, so this should be ok. The microop that loads the implicit DoubleBits register would fall into one of the microop slots for moving to/from special registers.
2009-07-17	X86: Tame the wilds of def operands.	Gabe Black

2009-02-25	X86: Add microops for reading/writing debug registers.	Gabe Black

2009-02-01	X86: Fix some incorrect register widths.	Gabe Black

2009-01-06	X86: Hook in the M5 pseudo insts.	Gabe Black

2008-10-12	X86: Add wrval/rdval microops for reading significant miscregs.	Gabe Black

2008-10-12	X86: Implement CPUID with a magical function instead of microcode.	Gabe Black

2008-06-12	X86: Keep handy values like the operating mode in one register.	Gabe Black

2008-06-12	X86: Change what the microop chks does.	Gabe Black
	Instead of computing the segment descriptor address, this now checks if a selector value/descriptor are legal for a particular purpose.
2008-06-12	X86: Add microops and supporting code to manipulate the whole rflags register.	Gabe Black

2008-06-12	X86: Add in some support for the tsc register.	Gabe Black

2007-12-01	X86: Reorganize segmentation and implement segment selector movs.	Gabe Black
	--HG-- extra : convert_revision : 553c3ffeda1f5312cf02493f602e7d4ba2fe66e8
2007-12-01	X86: Implement the lgdt instruction.	Gabe Black
	--HG-- extra : convert_revision : d1698a82df3c57cc9bbf8d5d190f271bfc7cb2e4
2007-12-01	X86: Implement wrbase and wrlimit for loading pseudo descriptors.	Gabe Black
	--HG-- extra : convert_revision : fe03c4aed95ef12773e80cdb3d9cff68a2b20f02
2007-12-01	X86: Separate the effective seg base and the "hidden" seg base.	Gabe Black
	--HG-- extra : convert_revision : 5fcb8d94dbab7a7d6fe797277a5856903c885ad4
2007-11-13	X86: Make microcode use presegmentation RIPs and the rest of m5 use post ↵	Gabe Black
	segmentation RIPS. --HG-- extra : convert_revision : d8cda7c8b9a2afb8a9d601b6d61529a96c5f87fe
2007-11-12	X86: Implement the wrcr microop which writes a control register, and some ↵	Gabe Black
	control register work. --HG-- extra : convert_revision : 3e9daef9cdd0665c033420e5b4f981649e9908ab
2007-09-19	X86: Move the fp microops to their own file with their own base classes in ↵	Gabe Black
	C++ and python. --HG-- extra : convert_revision : 9cd223f2005adb36fea2bb56fa39793a58ec958c
2007-09-19	X86: Put in the foundation for x87 stack based fp registers.	Gabe Black
	--HG-- extra : convert_revision : 940f92efd4a9dc59106e991cc6d9836861ab69de
2007-09-13	X86: Total overhaul of the division instructions and microops.	Gabe Black
	--HG-- extra : convert_revision : 303ea45f69f7805361ad877fe6bb43fbc3dfd7a6
2007-09-06	X86: Rework the multiplication microops so that they work like they would in ↵	Gabe Black
	the patent. --HG-- extra : convert_revision : 6fcf5dee440288d8bf92f6c5c2f97ef019975536
2007-08-29	X86: Add operands to handle floating point registers.	Gabe Black
	--HG-- extra : convert_revision : 2e8289dbd3f5dda1221014d4ed0e9450f60de0cf
2007-08-29	X86: Flesh out register indexing constants.	Gabe Black
	--HG-- extra : convert_revision : 56eedc076bbb7962c3976599a15ed93c7cb154c0
2007-08-07	X86: Make a microcode branch microop.	Gabe Black
	Also some touch up for ruflag. --HG-- extra : convert_revision : 829947169af25ca6573f53b9430707101c75cc23
2007-08-04	X86: Start implementing segmentation support.	Gabe Black
	Make instructions observe segment prefixes, default segment rules, segment base addresses. Also fix some microcode and add sib and riprel "keywords" to the x86 specialization of the microassembler. --HG-- extra : convert_revision : be5a3b33d33f243ed6e1ad63faea8495e46d0ac9
2007-07-30	X86: Take into account the regular registers and the microcode registers ↵	Gabe Black
	when decided whether or not to fold. --HG-- extra : convert_revision : 26feec984dec61799c4afb03a4503a53c35872c5
2007-07-30	Make the register indices use the appropriate "fold" bit.	Gabe Black
	--HG-- extra : convert_revision : 89e15e2ef1f709f2c09238b78f94505ce8ef146d
2007-07-19	x86 fixes	Gabe Black
	Make the emulation environment consider the rex prefix. Implement and hook in forms of j, jmp, cmp, syscall, movzx Added a format for an instruction to carry a call to the SE mode syscalls system Made memory instructions which refer to the rip do so directly Made the operand size overridable in the microassembly Made the "ext" field of register operations 16 bits to hold a sparse encoding of flags to set or conditions to predicate on Added an explicit "rax" operand for the syscall format Implemented syscall returns. --HG-- extra : convert_revision : ae84bd8c6a1d400906e17e8b8c4185f2ebd4c5f2
2007-07-17	Add in operand which holds the condition code bits of the flag register.	Gabe Black
	--HG-- extra : convert_revision : 416052f41fccc8286b3bdbe8d559512a761224f2
2007-06-19	Get rid of the immediate and displacement components of the EmulEnv struct ↵	Gabe Black
	and use them directly out of the instruction. The extra copies are conceptually realistic but are just innefficient as implemented. Also don't use the zeroeth microcode register for general storage since it's now the zero register, and implement a load and a store microops. --HG-- extra : convert_revision : 0686296ca8b72940d961ecc6051063bfda1e932d
2007-06-04	Reworking x86's microcode system. This is a work in progress, and X86 ↵	Gabe Black
	doesn't compile. src/arch/x86/isa/decoder/one_byte_opcodes.isa: src/arch/x86/isa/macroop.isa: src/arch/x86/isa/main.isa: src/arch/x86/isa/microasm.isa: src/arch/x86/isa/microops/base.isa: src/arch/x86/isa/microops/microops.isa: src/arch/x86/isa/operands.isa: src/arch/x86/isa/microops/regop.isa: src/arch/x86/isa/microops/specop.isa: Reworking x86's microcode system --HG-- extra : convert_revision : cab66be59ed758b192226af17eddd5a86aa190f3
2007-04-04	The process of going from an instruction definition to an instruction to be ↵	Gabe Black
	returned by the decoder has been fleshed out more. The following steps describe how an instruction implementation becomes a StaticInst. 1. Microops are created. These are StaticInsts use templates to provide a basic form of polymorphism without having to make the microassembler smarter. 2. An instruction class is created which has a "templated" microcode program as it's docstring. The template parameters are refernced with ^ following by a number. 3. An instruction in the decoder references an instruction template using it's mnemonic. The parameters to it's format end up replacing the placeholders. These parameters describe a source for an operand which could be memory, a register, or an immediate. It it's a register, the register index is used. If it's memory, eventually a load/store will be pre/postpended to the instruction template and it's destination register will be used in place of the ^. If it's an immediate, the immediate is used. Some operand types, specifically those that come from the ModRM byte, need to be decoded further into memory vs. register versions. This is accomplished by making the decode_block text for these instructions another case statement based off ModRM. 4. Once all of the template parameters have been handled, the instruction goes throw the microcode assembler which resolves labels and creates a list of python op objects. If an operand is a register, it uses a % prefix, an immediate uses $, and a label uses @. If the operand is just letters, numbers, and underscores, it can appear immediately after the prefix. If it's not, it can be encolsed in non nested {}s. 5. If there is a single "op" object (which corresponds to a single microop) the decoder is set up to return it directly. If not, a macroop wrapper is created around it. In the future, I'm considering seperating the operand type specialization from the template substitution step. A problem this introduces is that either the template arguments need to be kept around for the specialization step, or they need to be re-extracted. Re-extraction might be the way to go so that the operand formats can be coded directly into the micro assembler template without having to pass them in as parameters. I don't know if that's actually useful, though. src/arch/x86/isa/decoder/one_byte_opcodes.isa: src/arch/x86/isa/microasm.isa: src/arch/x86/isa/microops/microops.isa: src/arch/x86/isa/operands.isa: src/arch/x86/isa/microops/base.isa: Implemented polymorphic microops and changed around the microcode assembler syntax. --HG-- extra : convert_revision : e341f7b8ea9350a31e586a3d33250137e5954f43
2007-03-29	Add code to generate register and immediate based integer op microop classes.	Gabe Black
	--HG-- extra : convert_revision : 718f941da74dd3b4557cd21e1772879ac21aa9c6
2007-03-21	Add a junk operand. With no operands, the parser breaks.	Gabe Black
	--HG-- extra : convert_revision : 7410fd3681ed3d9b1293d982ed5f3553a6c75f3f
2007-03-05	Stub decoder. This is probably even farther from finished than it looks...	Gabe Black
	--HG-- extra : convert_revision : a39a158fec4560f6eb7a6987592c473677c0b1ba