gem5 - gem5

Age	Commit message (Collapse)	Author
2015-07-20	syscall_emul: [patch 13/22] add system call retry capability	Brandon Potter
	This changeset adds functionality that allows system calls to retry without affecting thread context state such as the program counter or register values for the associated thread context (when system calls return with a retry fault). This functionality is needed to solve problems with blocking system calls in multi-process or multi-threaded simulations where information is passed between processes/threads. Blocking system calls can cause deadlock because the simulator itself is single threaded. There is only a single thread servicing the event queue which can cause deadlock if the thread hits a blocking system call instruction. To illustrate the problem, consider two processes using the producer/consumer sharing model. The processes can use file descriptors and the read and write calls to pass information to one another. If the consumer calls the blocking read system call before the producer has produced anything, the call will block the event queue (while executing the system call instruction) and deadlock the simulation. The solution implemented in this changeset is to recognize that the system calls will block and then generate a special retry fault. The fault will be sent back up through the function call chain until it is exposed to the cpu model's pipeline where the fault becomes visible. The fault will trigger the cpu model to replay the instruction at a future tick where the call has a chance to succeed without actually going into a blocking state. In subsequent patches, we recognize that a syscall will block by calling a non-blocking poll (from inside the system call implementation) and checking for events. When events show up during the poll, it signifies that the call would not have blocked and the syscall is allowed to proceed (calling an underlying host system call if necessary). If no events are returned from the poll, we generate the fault and try the instruction for the thread context at a distant tick. Note that retrying every tick is not efficient. As an aside, the simulator has some multi-threading support for the event queue, but it is not used by default and needs work. Even if the event queue was completely multi-threaded, meaning that there is a hardware thread on the host servicing a single simulator thread contexts with a 1:1 mapping between them, it's still possible to run into deadlock due to the event queue barriers on quantum boundaries. The solution of replaying at a later tick is the simplest solution and solves the problem generally.
2016-08-15	cpu: Add missing override in Minor's exec context	Andreas Sandberg
	Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-08-15	cpu: Fixed clang errors. Added 'override' keyword for virtual functions.	Reiley Jeapaul
	Change-Id: Ic37311443ca11ee6d95bceffea599e054e7aa110 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-08-15	cpu, arch: fix the type used for the request flags	Nikos Nikoleris
	Change-Id: I183b9942929c873c3272ce6d1abd4ebc472c7132 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-07-21	cpu: Add SMT support to MinorCPU	Mitch Hayenga
	This patch adds SMT support to the MinorCPU. Currently RoundRobin or Random thread scheduling are supported. Change-Id: I91faf39ff881af5918cca05051829fc6261f20e3
2016-01-17	cpu. arch: add initiateMemRead() to ExecContext interface	Steve Reinhardt
	For historical reasons, the ExecContext interface had a single function, readMem(), that did two different things depending on whether the ExecContext supported atomic memory mode (i.e., AtomicSimpleCPU) or timing memory mode (all the other models). In the former case, it actually performed a memory read; in the latter case, it merely initiated a read access, and the read completion did not happen until later when a response packet arrived from the memory system. This led to some confusing things, including timing accesses being required to provide a pointer for the return data even though that pointer was only used in atomic mode. This patch splits this interface, adding a new initiateMemRead() function to the ExecContext interface to replace the timing-mode use of readMem(). For consistency and clarity, the readMemTiming() helper function in the ISA definitions is renamed to initiateMemRead() as well. For x86, where the access size is passed in explicitly, we can also get rid of the data parameter at this level. For other ISAs, where the access size is determined from the type of the data parameter, we have to keep the parameter for that purpose.
2015-09-30	cpu: Add per-thread monitors	Mitch Hayenga
	Adds per-thread address monitors to support FullSystem SMT.
2015-08-07	base: Declare a type for context IDs	Andreas Sandberg
	Context IDs used to be declared as ad hoc (usually as int). This changeset introduces a typedef for ContextIDs and a constant for invalid context IDs.
2015-07-28	revert 5af8f40d8f2c	Nilay Vaish

2015-07-26	cpu: implements vector registers	Nilay Vaish
	This adds a vector register type. The type is defined as a std::array of a fixed number of uint64_ts. The isa_parser.py has been modified to parse vector register operands and generate the required code. Different cpus have vector register files now.
2015-02-16	arch: Make readMiscRegNoEffect const throughout	Andreas Hansson
	Finally took the plunge and made this apply to all ISAs, not just ARM.
2014-11-06	x86 isa: This patch attempts an implementation at mwait.	Marc Orr
	Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-09-03	arch, cpu: Factor out the ExecContext into a proper base class	Andreas Sandberg
	We currently generate and compile one version of the ISA code per CPU model. This is obviously wasting a lot of resources at compile time. This changeset factors out the interface into a separate ExecContext class, which also serves as documentation for the interface between CPUs and the ISA code. While doing so, this changeset also fixes up interface inconsistencies between the different CPU models. The main argument for using one set of ISA code per CPU model has always been performance as this avoid indirect branches in the generated code. However, this argument does not hold water. Booting Linux on a simulated ARM system running in atomic mode (opt/10.linux-boot/realview-simple-atomic) is actually 2% faster (compiled using clang 3.4) after applying this patch. Additionally, compilation time is decreased by 35%.
2014-07-23	cpu: `Minor' in-order CPU model	Andrew Bardsley
	This patch contains a new CPU model named `Minor'. Minor models a four stage in-order execution pipeline (fetch lines, decompose into macroops, decompose macroops into microops, execute). The model was developed to support the ARM ISA but should be fixable to support all the remaining gem5 ISAs. It currently also works for Alpha, and regressions are included for ARM and Alpha (including Linux boot). Documentation for the model can be found in src/doc/inside-minor.doxygen and its internal operations can be visualised using the Minorview tool utils/minorview.py. Minor was designed to be fairly simple and not to engage in a lot of instruction annotation. As such, it currently has very few gathered stats and may lack other gem5 features. Minor is faster than the o3 model. Sample results: Benchmark \| Stat host_seconds (s) ---------------+--------v--------v-------- (on ARM, opt) \| simple \| o3 \| minor \| timing \| timing \| timing ---------------+--------+--------+-------- 10.linux-boot \| 169 \| 1883 \| 1075 10.mcf \| 117 \| 967 \| 491 20.parser \| 668 \| 6315 \| 3146 30.eon \| 542 \| 3413 \| 2414 40.perlbmk \| 2339 \| 20905 \| 11532 50.vortex \| 122 \| 1094 \| 588 60.bzip2 \| 2045 \| 18061 \| 9662 70.twolf \| 207 \| 2736 \| 1036