gem5 - gem5

Age	Commit message (Collapse)	Author
2016-02-06	style: remove trailing whitespace	Steve Reinhardt
	Result of running 'hg m5style --skip-all --fix-white -a'.
2016-01-17	cpu. arch: add initiateMemRead() to ExecContext interface	Steve Reinhardt
	For historical reasons, the ExecContext interface had a single function, readMem(), that did two different things depending on whether the ExecContext supported atomic memory mode (i.e., AtomicSimpleCPU) or timing memory mode (all the other models). In the former case, it actually performed a memory read; in the latter case, it merely initiated a read access, and the read completion did not happen until later when a response packet arrived from the memory system. This led to some confusing things, including timing accesses being required to provide a pointer for the return data even though that pointer was only used in atomic mode. This patch splits this interface, adding a new initiateMemRead() function to the ExecContext interface to replace the timing-mode use of readMem(). For consistency and clarity, the readMemTiming() helper function in the ISA definitions is renamed to initiateMemRead() as well. For x86, where the access size is passed in explicitly, we can also get rid of the data parameter at this level. For other ISAs, where the access size is determined from the type of the data parameter, we have to keep the parameter for that purpose.
2016-01-17	arch: don't call Timing functions from Atomic versions	Steve Reinhardt
	The readMemAtomic/writeMemAtomic helper functions were calling readMemTiming/writeMemTiming respectively. This is functionally correct, since the Timing functions are doing the same access initiation operation as the Atomic functions (just that the *Atomic versions also complete the access in line). It also provides for some (very minimal) code reuse. Unfortunately, it's potentially pretty confusing, since it makes it look like the atomic accesses are somehow being converted to timing accesses. It also gets in the way of specializing the timing interface (as will be done in a future patch).
2016-01-17	arch: get rid of unused LargestRead typedef	Steve Reinhardt

2016-01-07	pseudo inst,util: Add optional key to initparam pseudo instruction	Gabor Dozsa
	The key parameter can be used to read out various config parameters from within the simulated software.
2015-12-18	arm: remote GDB: rationalize structure of register offsets	Boris Shingarov
	Currently, the wire format of register values in g- and G-packets is modelled using a union of uint8/16/32/64 arrays. The offset positions of each register are expressed as a "register count" scaled according to the width of the register in question. This results in counter- intuitive and error-prone "register count arithmetic", and some formats would even be altogether unrepresentable in such model, e.g. a 64-bit register following a 32-bit one would have a fractional index in the regs64 array. Another difficulty is that the array is allocated before the actual architecture of the workload is known (and therefore before the correct size for the array can be calculated). With this patch I propose a simpler mechanism for expressing the register set structure. In the new code, GdbRegCache is an abstract class; its subclasses contain straightforward structs reflecting the register representation. The determination whether to use e.g. the AArch32 vs. AArch64 register set (or SPARCv8 vs SPARCv9, etc.) is made by polymorphically dispatching getregs() to the concrete subclass. The subclass is not instantiated until it is needed for actual g-/G-packet processing, when the mode is already known. This patch is not meant to be merged in on its own, because it changes the contract between src/base/remote_gdb.* and src/arch//remote_gdb., so as it stands right now, it would break the other architectures. In this patch only the base and the ARM code are provided for review; once we agree on the structure, I will provide src/arch//remote_gdb. for the other architectures; those patches could then be merged in together. Review Request: http://reviews.gem5.org/r/3207/ Pushed by Joel Hestness <jthestness@gmail.com>
2015-11-16	x86: Invalidating TLB entry on page fault	Swapnil Haria
	As per the x86 architecture specification, matching TLB entries need to be invalidated on a page fault. For instance, after a page fault due to inadequate protection bits on a TLB hit, the TLB entry needs to be invalidated. This behavior is clearly specified in the x86 architecture manuals from both AMD and Intel. This invalidation is missing currently in gem5, due to which linux kernel versions 3.8 and up cannot be simulated efficiently. This is exposed by a linux optimisation in commit e4a1cc56e4d728eb87072c71c07581524e5160b1, which removes a tlb flush on updating page table entries in x86. Testing: Linux kernel versions 3.8 onwards were booting very slowly in FS mode, due to repeated page faults (~300000 before the first print statement in a bash file). Ensured that page fault rate drops drastically and observed reduction in boot time from order of hours to minutes for linux kernel v3.8 and v3.11
2015-11-16	x86: cpuid: add family to warn() message	Bjoern A. Zeeb
	doCpuid() has to identical warn messages about unimplemented functions. Add the family to the log message to make them distinguishable. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-16	x86: pagetable walker: fix typo in comment	Bjoern A. Zeeb

2015-10-23	x86: Add missing explicit overrides for X86 devices	Andreas Hansson
	Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
2015-10-12	misc: Remove redundant compiler-specific defines	Andreas Hansson
	This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions.
2015-10-09	isa: Add parameter to pick different decoder inside ISA	Rekai Gonzalez Alberquilla
	The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time.
2015-10-06	x86: implement rcpps and rcpss SSE insts	Steve Reinhardt
	These are packed single-precision approximate reciprocal operations, vector and scalar versions, respectively. This code was basically developed by copying the code for sqrtps and sqrtss. The mrcp micro-op was simplified relative to msqrt since there are no double-precision versions of this operation.
2015-10-06	x86: implement fild, fucomi, and fucomip x87 insts	Steve Reinhardt
	fild loads an integer value into the x87 top of stack register. fucomi/fucomip compare two x87 register values (the latter also doing a stack pop). These instructions are used by some versions of GNU libstdc++.
2015-09-30	cpu,isa,mem: Add per-thread wakeup logic	Mitch Hayenga
	Changes wakeup functionality so that only specific threads on SMT capable cpus are woken.
2015-09-30	isa,cpu: Add support for FS SMT Interrupts	Mitch Hayenga
	Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems.
2015-07-20	x86: x86 instruction-implementation bug fixes	David Hashe
	Added explicit data sizes and an opcode type for correct execution.
2015-07-20	syscall: Add readlink to x86 with special case /proc/self/exe	David Hashe
	This patch implements the correct behavior.
2015-07-28	revert 5af8f40d8f2c	Nilay Vaish

2015-07-26	cpu: implements vector registers	Nilay Vaish
	This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.
2015-07-17	x86: decode instructions with vex prefix	Nilay Vaish
	This patch updates the x86 decoder so that it can decode instructions with vex prefix. It also updates the isa with opcodes from vex opcode maps 1, 2 and 3. Note that none of the instructions have been implemented yet. The implementations would be provided in due course of time.
2015-07-07	sim: Refactor the serialization base class	Andreas Sandberg
	Objects that are can be serialized are supposed to inherit from the Serializable class. This class is meant to provide a unified API for such objects. However, so far it has mainly been used by SimObjects due to some fundamental design limitations. This changeset redesigns to the serialization interface to make it more generic and hide the underlying checkpoint storage. Specifically: * Add a set of APIs to serialize into a subsection of the current object. Previously, objects that needed this functionality would use ad-hoc solutions using nameOut() and section name generation. In the new world, an object that implements the interface has the methods serializeSection() and unserializeSection() that serialize into a named /subsection/ of the current object. Calling serialize() serializes an object into the current section. * Move the name() method from Serializable to SimObject as it is no longer needed for serialization. The fully qualified section name is generated by the main serialization code on the fly as objects serialize sub-objects. * Add a scoped ScopedCheckpointSection helper class. Some objects need to serialize data structures, that are not deriving from Serializable, into subsections. Previously, this was done using nameOut() and manual section name generation. To simplify this, this changeset introduces a ScopedCheckpointSection() helper class. When this class is instantiated, it adds a new /subsection/ and subsequent serialization calls during the lifetime of this helper class happen inside this section (or a subsection in case of nested sections). * The serialize() call is now const which prevents accidental state manipulation during serialization. Objects that rely on modifying state can use the serializeOld() call instead. The default implementation simply calls serialize(). Note: The old-style calls need to be explicitly called using the serializeOld()/serializeSectionOld() style APIs. These are used by default when serializing SimObjects. * Both the input and output checkpoints now use their own named types. This hides underlying checkpoint implementation from objects that need checkpointing and makes it easier to change the underlying checkpoint storage code.
2015-07-04	x86: Adjust the size of the values written to the x87 misc registers	Nikos Nikoleris
	All x87 misc registers are implemented in an array of 64 bit values but in real hardware the size of some of these registers is smaller. Previsouly all 64 bits where incorrectly set and then later read. To ensure correctness we mask the value in setMiscRegNoEffect to write only the valid bits. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-05-15	misc: Appease gcc 5.1	Andreas Hansson
	Three minor issues are resolved: 1. Apparently gcc 5.1 does not like negation of booleans followed by bitwise AND. 2. Somehow the compiler also gets confused and warns about NoopMachInst being unused (removing it causes compilation errors though). Most likely a compiler bug. 3. There seems to be a number of instances where loop unrolling causes false positives for the array-bounds check. For now, switch to std::array. Potentially we could disable the warning for newer gcc versions, but switching to std::array is probably a good move in any case.
2015-05-05	syscall_emul: fix warn_once behavior	Steve Reinhardt
	The current ignoreWarnOnceFunc doesn't really work as expected, since it will only generate one warning total, for whichever "warn-once" syscall is invoked first. This patch fixes that behavior by keeping a "warned" flag in the SyscallDesc object, allowing suitably flagged syscalls to warn exactly once per syscall.
2015-05-05	mem, cpu: Add a separate flag for strictly ordered memory	Andreas Sandberg
	The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
2015-05-05	arch, cpu: Do not forward snoops to table walker	Andreas Hansson
	This patch simplifies the overall CPU by changing the TLB caches such that they do not forward snoops to the table walker port(s). Note that only ARM and X86 are affected. There is no reason for the ports to snoop as they do not actually take any action, and from a performance point of view we are better of not snooping more than we have to. Should it at a later point be required to snoop for a particular TLB design it is easy enough to add it back.
2015-04-29	x86: change divide-by-zero fault to divide-error	Nilay Vaish
	Same exception is raised whether division with zero is performed or the quotient is greater than the maximum value that the provided space can hold. Divide-by-Zero is the AMD terminology, while Divide-Error is Intel's.
2015-04-24	misc: Appease gcc 5.1 without moving GDB_REG_BYTES	Andreas Hansson
	This patch rolls back the move of the GDB_REG_BYTES constant, and instead adds M5_VAR_USED.
2015-04-23	misc: Appease gcc 5.1	Andreas Hansson
	This patch fixes a few small issues to ensure gem5 compiles when using gcc 5.1. First, the GDB_REG_BYTES in the RemoteGDB header are, rather surprisingly, flagged as unused for both ARM and X86. Removing them, however, causes compilation errors as they are actually used in the source file. Moving the constant into the class definition fixes the issue. Possibly a gcc bug. Second, we have an unused EthPktData constructor using auto_ptr, and the latter is deprecated. Since the code is never used it is simply removed.
2015-04-22	syscall_emul: implement clock_gettime system call	Brandon Potter

2015-04-22	syscall_emul: update x86 syscall table	Monir Mozumder
	Update table with additional definitions through Linux 3.13.
2015-04-13	x86: implements x87 mult/div instructions	Nilay Vaish

2015-04-03	x86: fix debug trace output for mwait	Lena Olson
	When running with the Exec flag, the mwait instruction attempted to print out its source registers, which were never actually initialized. This led to sporadic assertion failures when the value stored there was invalid. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-03-23	mem: rename Locked/LOCKED to LockedRMW/LOCKED_RMW	Steve Reinhardt
	Makes x86-style locked operations even more distinct from LLSC operations. Using "locked" by itself should be obviously ambiguous now.
2015-03-02	mem: Split port retry for all different packet classes	Andreas Hansson
	This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-02-16	arch: Make readMiscRegNoEffect const throughout	Andreas Hansson
	Finally took the plunge and made this apply to all ISAs, not just ARM.
2015-02-11	mem: Clarification of packet crossbar timings	Marco Balboni
	This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
2015-02-11	sim: Move the BaseTLB to src/arch/generic/	Andreas Sandberg
	The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-01-25	cpu: Put all CPU instruction tracers in a single file	Ali Saidi

2015-01-22	mem: Remove unused Packet src and dest fields	Andreas Hansson
	This patch takes the final step in removing the src and dest fields in the packet. These fields were rather confusing in that they only remember a single multiplexing component, and pushed the responsibility to the bridge and caches to store the fields in a senderstate, thus effectively creating a stack. With the recent changes to the crossbar response routing the crossbar is now responsible without relying on the packet fields. Thus, these variables are now unused and can be removed.
2015-01-22	x86: Delay X86 table walk on receiving walker response	Andreas Hansson
	This patch fixes a minor issue in the X86 page table walker where it ended up sending new request packets to the crossbar before the response processing was finished (recvTimingResp is directly calling sendTimingReq). Under certain conditions this caused the crossbar to see illegal combinations of request/response overlap, in turn causing problems with a slightly modified crossbar implementation.
2015-01-10	x86 : fxsave and fxrestore missing template code	Emilio Castillo
	This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for saving/restore the FP registers is in the file but it was not used. The FXSAVE and FXRSTOR instructions are used in the kernel for saving and loading the state of the mmx,xmm and fpu registers. This operation is triggered in FS by issuing a Device Not Available Fault. The cr0 register has a TS flag that is set upon each context change. Every time a task access any FP related register (SIMD as well) if the TS flag is set to one, the device not available fault is issued. The kernel saves the current state of the registers, and restore the previous state of the currently running task. Right now Gem5 lacks of this capability. the Device Not Available Fault is never issued, leading to several problems when different threads share the same CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this behavior. In order to test this a hack in the atomic cpu code was done to detect if a static instruction has any FP operands and the cr0 reg TS bit is set. This check must be done in the ISA dependent code. But it seems to be tricky to access the cr0 register while executing an instruction. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-06	x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.	Gabe Black
	These are for the monitor/mwait instructions, SSSE3, and XSAVE.
2015-01-06	cpuid, x86: Revert "Enabling more features in CPUid"	Gabe Black
	That change enables CPUID bits for features that aren't implemented in gem5. If a simulated system tries to use those features because it was told it could, bad things can happen.
2015-01-03	x86: implements the simd128 ADDSUBPD instruction	Maxime Martinasso
	This patch implements the simd128 ADDSUBPD instruction for the x86 architecture. Tested with a simple program in assembly language which executes the instruction. Checked that different versions of the instruction are executed by using the execution tracing option. Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-12-05	misc: Generalize GDB single stepping.	Gabe Black
	The new single stepping implementation for x86 doesn't rely on any ISA specific properties or functionality. This change pulls out the per ISA implementation of those functions and promotes the X86 implementation to the base class. One drawback of that implementation is that the CPU might stop on an instruction twice if it's affected by both breakpoints and single stepping. While that might be a little surprising, it's harmless and would only happen under somewhat unlikely circumstances.
2014-12-05	x86: Implement a remote GDB stub.	Gabe Black
	This stub should allow remote debugging of 32 bit and 64 bit targets. Single stepping seems to work, as do breakpoints. If both breakpoints and single stepping affect an instruction, gdb will stop at the instruction twice before continuing. That's a little surprising, but is generally harmless.
2014-12-04	x86: Rework opcode parsing to support 3 byte opcodes properly.	Gabe Black
	Instead of counting the number of opcode bytes in an instruction and recording each byte before the actual opcode, we can represent the path we took to get to the actual opcode byte by using a type code. That has a couple of advantages. First, we can disambiguate the properties of opcodes of the same length which have different properties. Second, it reduces the amount of data stored in an ExtMachInst, making them slightly easier/faster to create and process. This also adds some flexibility as far as how different types of opcodes are handled, which might come in handy if we decide to support VEX or XOP instructions. This change also adds tables to support properly decoding 3 byte opcodes. Before we would fall off the end of some arrays, on top of the ambiguity described above. This change doesn't measureably affect performance on the twolf benchmark. --HG-- rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
2014-12-02	x86: Clean up style in process.cc.	Gabe Black