gem5 - gem5

Age	Commit message (Collapse)	Author
2018-05-02	x86: Add a ld/st microop flag for marking an access uncacheable.	Gabe Black
	This percolates down to the memory request object which will have its "UNCACHEABLE" flag set. Change-Id: Ie73f4249bfcd57f45a473f220d0988856715a9ce Reviewed-on: https://gem5-review.googlesource.com/9881 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
2018-01-23	arch-x86: Adding clflush, clflushopt, clwb instructions	Swapnil Haria
	This patch adds support for cache flushing instructions in x86. It piggybacks on support for similar instructions in arm ISA added by Nikos Nikoleris. I have tested each instruction using microbenchmarks. Change-Id: I72b6b8dc30c236a21eff7958fa231f0663532d7d Reviewed-on: https://gem5-review.googlesource.com/7401 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
2017-12-13	x86: Rework how "split" loads/stores are handled.	Gabe Black
	Explicitly separate the way the data is represented in the underlying representation from how it's represented in the instruction. In order to make the ISA parser happy, the Mem operand needs to have a single, particular type. To handle that with scalar types, we just used uint64_ts and then worked with values that were smaller than the maximum we could hold. To work with these new array values, we also use an underlying uint64_t for each element. To make accessing the underlying memory system more natural, when we go to actually read or write values, we translate the access into an array of the actual, correct underlying type. That way we don't have non-exact asserts which confuse gcc, or weird endianness conversion which assumes that the data should be flipped 8 bytes at a time. Because the functions involved are generally inline, the syntactic niceness should all boil off, and the final implementation in the binary should be simple and efficient for the given data types. Change-Id: I14ce7a2fe0dc2cbaf6ad4a0d19f743c45ee78e26 Reviewed-on: https://gem5-review.googlesource.com/6582 Reviewed-by: Gabe Black <gabeblack@google.com> Maintainer: Gabe Black <gabeblack@google.com>
2017-11-07	alpha,arm,mips,power,riscv,sparc,x86: Merge exec decl templates.	Gabe Black
	In the ISA instruction definitions, some classes were declared with execute, etc., functions outside of the main template because they had CPU specific signatures and would need to be duplicated with each CPU plugged into them. Now that the instructions always just use an ExecContext, there's no reason for those templates to be separate. This change folds those templates together. Change-Id: I13bda247d3d1cc07c0ea06968e48aa5b4aace7fa Reviewed-on: https://gem5-review.googlesource.com/5401 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Alec Roelke <ar4jc@virginia.edu> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2017-11-02	alpha,arm,mips,power,riscv,sparc,x86,isa: De-specialize ExecContexts.	Gabe Black
	The ISA parser used to generate different copies of exec functions for each exec context class a particular CPU wanted to use. That's since been changed so that those functions take a pointer to the base ExecContext, so the code which would generate those extra functions can be removed, and some functions which used to be templated on an ExecContext subclass can be untemplated, or minimally less templated. Now that some functions aren't going to be instantiated multiple times with different signatures, there are also opportunities to collapse templates and make many instruction definitions simpler within the parser. Since those changes will be less mechanical, they're left for later changes and will probably be done in smaller increments. Change-Id: I0015307bb02dfb9c60380b56d2a820f12169ebea Reviewed-on: https://gem5-review.googlesource.com/5381 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2017-02-10	x86: Fix implicit stack addressing in 64-bit mode	Jason Lowe-Power
	When in 64-bit mode, if the stack is accessed implicitly by an instruction the alternate address prefix should be ignored if present. This patch adds an extra flag to the ldstop which signifies when the address override should be ignored. Then, for all of the affected instructions, this patch adds two options to the ld and st opcode to use the current stack addressing mode for all addresses and to ignore the AddressSizeFlagBit. Finally, this patch updates the x86 TLB to not truncate the address if it is in 64-bit mode and the IgnoreAddrSizeFlagBit is set. This fixes a problem when calling __libc_start_main with a binary that is linked with a recent version of ld. This version of ld uses the address override prefix (0x67) on the call instruction instead of a nop. Note: This has not been tested in compatibility mode and only the call instruction with the address override prefix has been tested. See [1] page 9 (pdf page 45) For instructions that are affected see [1] page 519 (pdf page 555). [1] http://support.amd.com/TechDocs/24594.pdf Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-02-06	x86: revamp cmpxchg8b/cmpxchg16b implementation	Alexandru Dutu
	The previous implementation did a pair of nested RMW operations, which isn't compatible with the way that locked RMW operations are implemented in the cache models. It was convenient though in that it didn't require any new micro-ops, and supported cmpxchg16b using 64-bit memory ops. It also worked in AtomicSimpleCPU where atomicity was guaranteed by the core and not by the memory system. It did not work with timing CPU models though. This new implementation defines new 'split' load and store micro-ops which allow a single memory operation to use a pair of registers as the source or destination, then uses a single ldsplit/stsplit RMW pair to implement cmpxchg. This patch requires support for 128-bit memory accesses in the ISA (added via a separate patch) to support cmpxchg16b.
2016-02-06	arch, x86: add support for arrays as memory operands	Steve Reinhardt
	Although the cache models support wider accesses, the ISA descriptions assume that (for the most part) memory operands are integer types, which makes it difficult to define instructions that do memory accesses larger than 64 bits. This patch adds some generic support for memory operands that are arrays of uint64_t, and specifically a 'u2qw' operand type for x86 that is an array of 2 uint64_ts (128 bits). This support is unused at this point, but will be needed shortly for cmpxchg16b. Ideally the 128-bit SSE memory accesses will also be rewritten to use this support. Support for 128-bit accesses could also have been added using the gcc __int128_t extension, which would have been less disruptive. However, although clang also supports __int128_t, it's still non-standard. Also, more importantly, this approach creates a path to defining 256- and 512-byte operands as well, which will be useful for eventual AVX support.
2016-01-17	cpu. arch: add initiateMemRead() to ExecContext interface	Steve Reinhardt
	For historical reasons, the ExecContext interface had a single function, readMem(), that did two different things depending on whether the ExecContext supported atomic memory mode (i.e., AtomicSimpleCPU) or timing memory mode (all the other models). In the former case, it actually performed a memory read; in the latter case, it merely initiated a read access, and the read completion did not happen until later when a response packet arrived from the memory system. This led to some confusing things, including timing accesses being required to provide a pointer for the return data even though that pointer was only used in atomic mode. This patch splits this interface, adding a new initiateMemRead() function to the ExecContext interface to replace the timing-mode use of readMem(). For consistency and clarity, the readMemTiming() helper function in the ISA definitions is renamed to initiateMemRead() as well. For x86, where the access size is passed in explicitly, we can also get rid of the data parameter at this level. For other ISAs, where the access size is determined from the type of the data parameter, we have to keep the parameter for that purpose.
2015-10-06	x86: implement fild, fucomi, and fucomip x87 insts	Steve Reinhardt
	fild loads an integer value into the x87 top of stack register. fucomi/fucomip compare two x87 register values (the latter also doing a stack pop). These instructions are used by some versions of GNU libstdc++.
2015-03-23	mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW	Steve Reinhardt
	Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.
2014-05-09	arch: teach ISA parser how to split code across files	Curtis Dunham
	This patch encompasses several interrelated and interdependent changes to the ISA generation step. The end goal is to reduce the size of the generated compilation units for instruction execution and decoding so that batch compilation can proceed with all CPUs active without exhausting physical memory. The ISA parser (src/arch/isa_parser.py) has been improved so that it can accept 'split [output_type];' directives at the top level of the grammar and 'split(output_type)' python calls within 'exec {{ ... }}' blocks. This has the effect of "splitting" the files into smaller compilation units. I use air-quotes around "splitting" because the files themselves are not split, but preprocessing directives are inserted to have the same effect. Architecturally, the ISA parser has had some changes in how it works. In general, it emits code sooner. It doesn't generate per-CPU files, and instead defers to the C preprocessor to create the duplicate copies for each CPU type. Likewise there are more files emitted and the C preprocessor does more substitution that used to be done by the ISA parser. Finally, the build system (SCons) needs to be able to cope with a dynamic list of source files coming out of the ISA parser. The changes to the SCons{cript,truct} files support this. In broad strokes, the targets requested on the command line are hidden from SCons until all the build dependencies are determined, otherwise it would try, realize it can't reach the goal, and terminate in failure. Since build steps (i.e. running the ISA parser) must be taken to determine the file list, several new build stages have been inserted at the very start of the build. First, the build dependencies from the ISA parser will be emitted to arch/$ISA/generated/inc.d, which is then read by a new SCons builder to finalize the dependencies. (Once inc.d exists, the ISA parser will not need to be run to complete this step.) Once the dependencies are known, the 'Environments' are made by the makeEnv() function. This function used to be called before the build began but now happens during the build. It is easy to see that this step is quite slow; this is a known issue and it's important to realize that it was already slow, but there was no obvious cause to attribute it to since nothing was displayed to the terminal. Since new steps that used to be performed serially are now in a potentially-parallel build phase, the pathname handling in the SCons scripts has been tightened up to deal with chdir() race conditions. In general, pathnames are computed earlier and more likely to be stored, passed around, and processed as absolute paths rather than relative paths. In the end, some of these issues had to be fixed by inserting serializing dependencies in the build. Minor note: For the null ISA, we just provide a dummy inc.d so SCons is never compelled to try to generate it. While it seems slightly wrong to have anything in src/arch/*/generated (i.e. a non-generated 'generated' file), it's by far the simplest solution.
2014-05-09	arch: remove inline specifiers on all inst constrs, all ISAs	Curtis Dunham
	With (upcoming) separate compilation, they are useless. Only link-time optimization could re-inline them, but ideally feedback-directed optimization would choose to do so only for profitable (i.e. common) instructions.
2013-09-30	x86: Add support for loading 32-bit and 80-bit floats in the x87	Andreas Sandberg
	The x87 FPU supports three floating point formats: 32-bit, 64-bit, and 80-bit floats. The current gem5 implementation supports 32-bit and 64-bit floats, but only works correctly for 64-bit floats. This changeset fixes the 32-bit float handling by correctly loading and rounding (using truncation) 32-bit floats instead of simply truncating the bit pattern. 80-bit floats are loaded by first loading the 80-bits of the float to two temporary integer registers. A micro-op (cvtint_fp80) then converts the contents of the two integer registers to the internal FP representation (double). Similarly, when storing an 80-bit float, there are two conversion routines (ctvfp80h_int and cvtfp80l_int) that convert an internal FP register to 80-bit and stores the upper 64-bits or lower 32-bits to an integer register, which is the written to memory using normal integer stores.
2012-03-31	X86: Fix address size handling so real mode works properly.	Gabe Black
	Virtual (pre-segmentation) addresses are truncated based on address size, and any non-64 bit linear address is truncated to 32 bits. This means that real mode addresses aren't truncated down to 16 bits after their segment bases are added in.
2011-09-26	ISA parser: Use '_' instead of '.' to delimit type modifiers on operands.	Gabe Black
	By using an underscore, the "." is still available and can unambiguously be used to refer to members of a structure if an operand is a structure, class, etc. This change mostly just replaces the appropriate "."s with "_"s, but there were also a few places where the ISA descriptions where handling the extensions themselves and had their own regular expressions to update. The regular expressions in the isa parser were updated as well. It also now looks for one of the defined type extensions specifically after connecting "_" where before it would look for any sequence of characters after a "." following an operand name and try to use it as the extension. This helps to disambiguate cases where a "_" may legitimately be part of an operand name but not separate the name from the type suffix. Because leaving the "_" and suffix on the variable name still leaves a valid C++ identifier and all extensions need to be consistent in a given context, I considered leaving them on as a breadcrumb that would show what the intended type was for that operand. Unfortunately the operands can be referred to in code templates, the Mem operand in particular, and since the exact type of Mem can be different for different uses of the same template, that broke things.
2011-07-02	ISA: Use readBytes/writeBytes for all instruction level memory operations.	Gabe Black

2011-07-02	X86: Fix store microops so they don't drop faults in timing mode.	Gabe Black
	If a fault was returned by the CPU when a store initiated it's write, the store instruction would ignore the fault. This change fixes that.
2011-06-21	X86: Eliminate an unused argument for building store microops.	Gabe Black

2011-03-01	X86: Mark IO reads and writes as non-speculative.	Gabe Black

2011-03-01	X86: Mark prefetches as such in their instruction and request flags.	Gabe Black

2011-02-13	X86: Don't read in dest regs if all bits are replaced.	Gabe Black
	In x86, 32 and 64 bit writes to registers in which registers appear to be 32 or 64 bits wide overwrite all bits of the destination register. This change removes false dependencies in these cases where the previous value of a register doesn't need to be read to write a new value. New versions of most microops are created that have a "Big" suffix which simply overwrite their destination, and the right version to use is selected during microop allocation based on the selected data size. This does not change the performance of the O3 CPU model significantly, I assume because there are other false dependencies from the condition code bits in the flags register.
2011-02-02	X86: Get rid of the stupd microop.	Gabe Black

2010-08-23	X86: Get rid of the flagless microop constructor.	Gabe Black
	This will reduce clutter in the source and hopefully speed up compilation.
2010-08-23	X86: Consolidate extra microop flags into one parameter.	Gabe Black
	This single parameter replaces the collection of bools that set up various flavors of microops. A flag parameter also allows other flags to be set like the serialize before/after flags, etc., without having to change the constructor.
2010-05-23	copyright: Change HP copyright on x86 code to be more friendly	Nathan Binkert

2009-11-08	X86: Make x86 use PREFETCH instead of PF_EXCLUSIVE.	Gabe Black

2009-08-23	X86: Preserve the NO_ACCESS flag when giving CDA a specialized interface.	Gabe Black

2009-07-16	X86: Take limitted advantage of the compilers type checking for microop ↵	Gabe Black
	operands.
2009-04-23	X86: Put the StoreCheck flag with the others, and don't collide with other ↵	Gabe Black
	flags.
2009-04-19	X86: Implement the stul microop.	Gabe Black
	This microop does a store and unlocks the requested address. The RISC86 microop ISA doesn't seem to have an equivalent to this, so I'm guessing that the store following an ldstl is automatically unlocking. We don't do it this way for performance reasons since the behavior is the same.
2009-04-19	X86: Implement the ldstl microop.	Gabe Black
	This microop does a load, checks that a store would succeed, and locks the requested address.
2009-04-19	X86: LEA calculates an address before segmentation.	Gabe Black

2009-02-27	X86: Take address size into account when computing an effective address.	Gabe Black

2009-02-27	X86: Fix segment limit checks.	Gabe Black

2009-02-25	X86: Implement a basic prefetch instruction.	Gabe Black

2009-02-25	X86: Use the right portion of a register for stores.	Gabe Black

2009-02-25	X86: Add a flag to force memory accesses to happen at CPL 0.	Gabe Black

2009-02-25	X86: Make the stupd microop not update registers in initiateAcc.	Gabe Black

2009-02-25	CPU: Get rid of translate... functions from various interface classes.	Gabe Black

2009-01-06	X86: Autogenerate macroop generateDisassemble function.	Gabe Black

2008-11-09	X86: Fix completeAcc get call.	Gabe Black

2008-02-26	X86: Implement the INVLPG instruction and the TIA microop.	Gabe Black
	--HG-- extra : convert_revision : 31db1ee082f6c3ca5443cba1eb335e408661ead2
2007-11-12	X86: Various fixes to indexing segmentation related registers	Gabe Black
	--HG-- extra : convert_revision : 3d45da3a3fb38327582cfdfb72cfc4ce1b1d31af
2007-10-22	X86: Implement the cda microop which checks if an address is legal to write to.	Gabe Black
	--HG-- extra : convert_revision : afe20649180dd59ad0702b98f7293be6c9226359
2007-10-21	X86: Implement the stupd microop ("store with update", not "stupid") and use ↵	Gabe Black
	it in ENTER. --HG-- extra : convert_revision : 9151f701162d31ef26298497467c42b7b0ed85d5
2007-10-12	X86: Implement MSR reads and writes and the wrsmr and rdmsr instructions.	Gabe Black
	There are no priviledge checks, so these instructions will all work in all modes. --HG-- extra : convert_revision : ff893eb569313d8aecbfffb47bcbd1c2d65cd393
2007-10-02	X86: Implement the ldst microop and put it in existing microcode where ↵	Gabe Black
	appropriate. --HG-- extra : convert_revision : f08bd725d07a501bb7a0ce91590b5d37db99c6f3
2007-08-29	X86: Add load and store microops that use the fp registers.	Gabe Black
	--HG-- extra : convert_revision : 153a055e888d8c47d59758a599dbd38f63008137
2007-08-26	X86: Remove x86 code that attempted to fix misaligned accesses.	Gabe Black
	--HG-- extra : convert_revision : 42f68010e6498aceb7ed25da278093e99150e4df