gem5 - gem5

Age	Commit message (Collapse)	Author
2016-11-17	alpha: Remove ALPHA tru64 support and associated tests	Andreas Hansson
	No one appears to be using it, and it is causing build issues and increases the development and maintenance effort.
2016-10-26	hsail,gpu-compute: fixes to appease clang++	Tony Gutierrez
	fixes to appease clang++. tested on: Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final) (based on LLVM 3.5.0) Ubuntu clang version 3.6.0-2ubuntu1~trusty1 (tags/RELEASE_360/final) (based on LLVM 3.6.0) the fixes address the following five issues: 1) the exec continuations in gpu_static_inst.hh were marked as protected when they should be public. here we mark them as public 2) the Abs instruction uses std::abs() in its execute method. because Abs is templated, it can also operate on U32 and U64, types, which cause Abs::execute() to pass uint32_t and uint64_t types to std::abs() respectively. this triggers a warning because std::abs() has no effect in this case. to rememdy this we add template specialization for the execute() method of Abs when its template paramter is U32 or U64. 3) Some potocols that utilize the code in cprintf.hh were missing includes to BoolVec.hh, which defines operator<< for the BoolVec type. This would cause issues when the generated code would try to pass a BoolVec type to a method in cprintf.hh that used operator<< on an instance of a BoolVec. 4) Surprise, clang doesn't like it when you clobber all the bits in a newly allocated object. I.e., this code: tlb = new GpuTlbEntry\[size\]; std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size); Let's use std::vector to track the TLB entries in the GpuTlb now... 5) There were a few variables used only in DPRINTFs, so we mark them with M5_VAR_USED.
2016-10-26	dev: Add m5 op to toggle synchronization for dist-gem5.	Michael LeBeane
	This patch adds the ability for an application to request dist-gem5 to begin/ end synchronization using an m5 op. When toggling on sync, all nodes agree on the next sync point based on the maximum of all nodes' ticks. CPUs are suspended until the sync point to avoid sending network messages until sync has been enabled. Toggling off sync acts like a global execution barrier, where all CPUs are disabled until every node reaches the toggle off point. This avoids tricky situations such as one node hitting a toggle off followed by a toggle on before the other nodes hit the first toggle off.
2016-10-26	ruby: Allow multiple outstanding DMA requests	Michael LeBeane
	DMA sequencers and protocols can currently only issue one DMA access at a time. This patch implements the necessary functionality to support multiple outstanding DMA requests in Ruby.
2016-10-26	dev: Add 'simLength' parameter in EthPacketData	mlebeane
	Currently, all the network devices create a 16K buffer for the 'data' field in EthPacketData, and use 'length' to keep track of the size of the packet in the buffer. This patch introduces the 'simLength' parameter to EthPacketData, which is used to hold the effective length of the packet used for all timing calulations in the simulator. Serialization is performed using only the useful data in the packet ('length') and not necessarily the entire original buffer.
2016-10-26	gpu-compute: support in-order data delivery in GM pipe	Tony Gutierrez
	this patch adds an ordered response buffer to the GM pipeline to ensure in-order data delivery. the buffer is implemented as a stl ordered map, which sorts the request in program order by using their sequence ID. when requests return to the GM pipeline they are marked as done. only the oldest request may be serviced from the ordered buffer, and only if is marked as done. the FIFO response buffers are kept and used in OoO delivery mode
2016-10-26	gpu-compute, hsail: pass GPUDynInstPtr to getRegisterIndex()	Tony Gutierrez
	for HSAIL an operand's indices into the register files may be calculated trivially, because the operands are always read from a register file, or are an immediate. for machine ISA, however, an op selector may specify special registers, or may specify special SGPRs with an alias op selector value. the location of some of the special registers values are dependent on the size of the RF in some cases. here we add a way for the underlying getRegisterIndex() method to know about the size of the RFs, so that it may find the relative positions of the special register values.
2016-10-26	gpu-compute: use System cache line size in the GPU	Tony Gutierrez

2016-10-26	gpu-compute, hsail: make the PC a byte address, not an instruction index	Tony Gutierrez
	currently the PC is incremented on an instruction granularity, and not as an instruction's byte address. machine ISA instructions assume the PC is a byte address, and is incremented accordingly. here we make the GPU model, and the HSAIL instructions treat the PC as a byte address as well.
2016-10-26	gpu-compute: add gpu_isa.hh to switch hdrs, add GPUISA to WF	Tony Gutierrez
	the GPUISA class is meant to encapsulate any ISA-specific behavior - special register accesses, isa-specific WF/kernel state, etc. - in a generic enough way so that it may be used in ISA-agnostic code. gpu-compute: use the GPUISA object to advance the PC the GPU model treats the PC as a pointer to individual instruction objects - which are store in a contiguous array - and not a byte address to be fetched from the real memory system. this is ok for HSAIL because all instructions are considered by the model to be the same size. in machine ISA, however, instructions may be 32b or 64b, and branches are calculated by advancing the PC by the number of words (4 byte chunks) it needs to advance in the real instruction stream. because of this there is a mismatch between the PC we use to index into the instruction array, and the actual byte address PC the ISA expects. here we move the PC advance calculation to the ISA so that differences in the instrucion sizes may be accounted for in generic way.
2016-10-26	gpu-compute: add instruction mix stats for the gpu	Tony Gutierrez

2016-10-26	gpu-compute, hsail: call discardFetch() from the WF	Tony Gutierrez
	because every taken branch causes fetch to be discarded, we move the call to the WF to avoid to have to call it from each and every branch instruction type.
2016-10-26	hsail, gpu-compute: remove doGm/SmReturn add completeAcc	Tony Gutierrez
	we are removing doGmReturn from the GM pipe, and adding completeAcc() implementations for the HSAIL mem ops. the behavior in doGmReturn is dependent on HSAIL and HSAIL mem ops, however the completion phase of memory ops in machine ISA can be very different, even amongst individual machine ISA mem ops. so we remove this functionality from the pipeline and allow it to be implemented by the individual instructions.
2016-10-26	gpu-compute: remove inst enums and use bit flag for attributes	Tony Gutierrez
	this patch removes the GPUStaticInst enums that were defined in GPU.py. instead, a simple set of attribute flags that can be set in the base instruction class are used. this will help unify the attributes of HSAIL and machine ISA instructions within the model itself. because the static instrution now carries the attributes, a GPUDynInst must carry a pointer to a valid GPUStaticInst so a new static kernel launch instruction is added, which carries the attributes needed to perform a the kernel launch.
2016-10-26	gpu-compute: move disassemle() implementation to GPUStaticInst	Tony Gutierrez

2016-10-26	gpu-compute, arch: add some methods to the base inst classes for ISA support	Tony Gutierrez

2016-10-26	ruby: make a RequestDesc class instead of std::pair	Tony Gutierrez
	the RequestDesc was previously implemented as a std::pair, which made the implementation overly complex and error prone. here we encapsulate the packet, primary, and secondary types all in a single data structure with all members properly intialized in a ctor
2016-10-26	config: Break out base options for usage with NULL ISA	Andreas Hansson
	This patch breaks out the most basic configuration options into a set of base options, to allow them to be used also by scripts that do not involve any ISA, and thus no actual CPUs or devices. The patch also fixes a few modules so that they can be imported in a NULL build, and avoid dragging in FSConfig every time Options is imported.
2016-10-19	stats: Update stats to reflect recent changes to floats	Andreas Hansson
	Mostly just splitting out the floats ops and corresponding reads/writes.
2016-10-15	arm: Fix for ARM's Streamline conversion script	Shawn Rosti
	tracked down issue with ARM's version of gem5 using the "cluster" name. The public/github version of ARM Gem5 does not use the "cluster" naming mechanism. Signed-off-by: Dam Sunwoo <dam.sunwoo@arm.com> Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15	arm, dev: pl011 console interactivity	Bjoern A. Zeeb
	Improve PL011 console interactivity Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15	syscall: read() should not write anything if reading EOF.	Nicolas Derumigny
	Read() should not write anything when returning 0 (EOF). This patch does not correct the same bug occuring for : nbr_read=read(file, buf, nbytes) When nbr_read<nbytes, nbytes bytes are copied into the virtual RAM instead of nbr_read. If buf is smaller than nbytes, a page fault occurs, even if buf is in fact bigger than nbr_read. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-15	cpu, arm: Distinguish Float* and SimdFloat, create FloatMem opClass	Fernando Endo
	Modify the opClass assigned to AArch64 FP instructions from SimdFloat* to Float*. Also create the FloatMemRead and FloatMemWrite opClasses, which distinguishes writes to the INT and FP register banks. Change the latency of (Simd)FloatMultAcc to 5, based on the Cortex-A72, where the "latency" of FMADD is 3 if the next instruction is a FMADD and has only the augend to destination dependency, otherwise it's 7 cycles. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-10-14	config: Make configs/common a Python package	Andreas Hansson
	Continue along the same line as the recent patch that made the Ruby-related config scripts Python packages and make also the configs/common directory a package. All affected config scripts are updated (hopefully). Note that this change makes it apparent that the current organisation and naming of the config directory and its subdirectories is rather chaotic. We mix scripts that are directly invoked with scripts that merely contain convenience functions. While it is not addressed in this patch we should follow up with a re-organisation of the config structure, and renaming of some of the packages.
2016-10-14	stats: Add more information to uninitialized error	Jason Lowe-Power
	ClockedObject was changed to require its regStats() to be called from every child class. If you forget to do this, the error was indecipherable. This patch makes the error more clear.
2016-10-13	stats: update references	Curtis Dunham

2016-10-13	mem: add DRAM powerdown current	Omar Naji
	Change-Id: I763cffe0c69f5ebbbf6a6eb12bec5c13d5d0161d Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13	mem: Add DRAM low-power functionality	Wendy Elsasser
	Added power-down state transitions to the DRAM controller model. Added per rank parameter, outstandingEvents, which tracks the number of outstanding command events and is used to determine when the controller should transition to a low power state. The controller will only transition when there are no outstanding events scheduled and the number of command entries for the given rank is 0. The outstandingEvents parameter is incremented for every RD/WR burst, PRE, and REF event scheduled. ACT is implicitly covered by RD/WR since burst will always issue and complete after a required ACT. The parameter is decremented when the event is serviced (completed). The controller will automatically transition to ACT power down, PRE power down, or SREF. Transition to ACT power down state scheduled from: 1) The RespondEvent, where read data is received from the memory. ACT power-down entry will be scheduled when one or more banks is open, all commands for the rank have completed (no more commands scheduled), and there are no commands in queue for the rank Transition to PRE power down scheduled from: 1) respondEvent, when all banks are closed, all commands have completed, and there are no commands in queue for the rank 2) prechargeEvent when all banks are closed, all commands have completed, and there are no commands in queue for the rank 3) refreshEvent, after the refresh is complete when the previous state was ACT power-down 4) refreshEvent, after the refresh is complete when the previous state was PRE power-down and there are commands in the queue. Transition to SREF will be scheduled from: 1) refreshEvent, after the refresh is completes when the previous state was PRE power-down with no commands in queue Power-down exit commands are scheduled from: 1) The refreshEvent, prior to issuing a refresh 2) doDRAMAccess, to wake-up the rank for RD/WR command issue. Self-refresh exit commands are scheduled from: 1) The next request event, when the queue has commands for the rank in the readQueue or there are commands for the rank in the writeQueue and the bus state is WRITE. Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13	mem: Add callback to compute stats prior to dump event	Wendy Elsasser
	The per rank statistics are periodically updated based on state transition and refresh events. Add a method to update these when a dump event occurs to ensure they reflect accurate values. Specifically, need to ensure that the low-power state durations, power, and energy are logged correctly. Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13	mem: Modify drain to ensure banks and power are idled	Wendy Elsasser
	Add constraint that all ranks have to be in PWR_IDLE before signaling drain complete This will ensure that the banks are all closed and the rank has exited any low-power states. On suspend, update the power stats to sync the DRAM power logic The logic maintains the location of the signalDrainDone method, which is still triggered from either: 1) Read response event 2) Next request event This ensures that the drain will complete in the READ bus state and minimizes the changes required. Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13	mem: Sort memory commands and update DRAMPower	Wendy Elsasser
	Add local variable to stores commands to be issued. These commands are in order within a single bank but will be out of order across banks & ranks. A new procedure, flushCmdList, sorts commands across banks / ranks, and flushes the sorted list, up to curTick() to DRAMPower. This is currently called in refresh, once all previous commands are guaranteed to have completed. Could be called in other events like the powerEvent as well. By only flushing commands up to curTick(), will not get out of sync when flushed at a periodic stats dump (done in subsequent patch). Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13	mem: update DDR3 die revision	Omar Naji
	Change-Id: I8992ddc1664c3ed4b2d36d8a34e4ce8be113b9de Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13	mem: add DRAM powerdown timing	Omar Naji

2016-10-13	mem: make DDR4 x16	Omar Naji

2016-10-13	isa,arm: Add missing AArch32 FP instructions	Mitch Hayenga
	This commit adds missing non-predicated, scalar floating point instructions. Specifically VRINT* floating point integer rounding instructions and VSEL* floating point conditional selects. Change-Id: I23cbd1389f151389ac8beb28a7d18d5f93d000e7 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nathanael Premillieu <nathanael.premillieu@arm.com>
2016-10-13	ruby: Fix regressions and make Ruby configs Python packages	Andreas Hansson
	This patch moves the addition of network options into the Ruby module to avoid the regressions all having to add it explicitly. Doing this exposes an issue in our current config system though, namely the fact that addtoPath is relative to the Python script being executed. Since both example and regression scripts use the Ruby module we would end up with two different (relative) paths being added. Instead we take a first step at turning the config modules into Python packages, simply by adding a __init__.py in the configs/ruby, configs/topologies and configs/network subdirectories. As a result, we can now add the top-level configs directory to the Python search path, and then use the package names in the various modules. The example scripts are also updated, and the messy path-deducing variations in the scripts are unified.
2016-10-07	config: fix typo in cluster topology.	Tushar Krishna

2016-10-07	dev, arm: Make GenericTimer param handling more robust	Andreas Sandberg
	The generic timer needs a pointer to an ArmSystem to wire itself to the system register handler. This was previously specified as an instance of System that was later cast to ArmSystem. Make this more robust by specifying it as an ArmSystem in the Python interface and add a check to make sure that it is non-NULL. Change-Id: I989455e666f4ea324df28124edbbadfd094b0d02 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>
2016-10-06	ruby: Add M5_VAR_USED before variables used only inside assert in garnet2.0.	Tushar Krishna
	This removes errors when building gem5.fast
2016-10-06	ruby: garnet2.0	Tushar Krishna
	Revamped version of garnet with more optimized single-cycle routers, more configurability, and cleaner code.
2016-10-06	ruby: remove the original garnet code.	Tushar Krishna
	Only garnet2.0 will be supported henceforth.
2016-10-06	config: add port directions and per-router delay in topology.	Tushar Krishna
	This patch adds port direction names to the links during topology creation, which can be used for better printed names for the links or for users to code up their own adaptive routing algorithms. It also adds support for every router to have an independent latency value to support heterogeneous topologies with the subsequent garnet2.0 patch.
2016-10-06	config: make internal links in network topology unidirectional.	Tushar Krishna
	This patch makes the internal links within the network topology unidirectional, thus allowing any deadlock-free routing algorithms to be specified from the topology itself using weights. This patch also renames Mesh.py and MeshDirCorners.py to Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing). It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
2016-10-06	config: add a separate config file for the network.	Tushar Krishna
	This patch adds a new file configs/network/Network.py to setup the network, instead of doing that within Ruby.py.
2016-10-06	ruby: rename networktest to garnet_synthetic_traffic.	Tushar Krishna
	networktest is essentially a collection of synthetic traffic patterns for the network. The protocol name and the tester having the same name led to multiple python configuration files with the same name, adding confusion. This patch renames networktest to garnet_synthetic_traffic, and also adds more synthetic traffic patterns.
2016-10-06	ruby: rename ALPHA_Network_test protocol to Garnet_standalone.	Tushar Krishna
	Over the past 6 years, we realized that the protocol is essentially used to run the garnet network in a standalone manner, and feed standard synthetic traffic patterns through it.
2016-10-04	kvm: Adding details to kvm page fault in x86	Alexandru Dutu
	Adding details, e.g. rip, rsp etc. to the kvm pagefault exit when in SE mode.
2016-10-04	misc: Adds a warning in case gdb is attached multiple times	Alexandru Dutu
	Instead of scheduling another event, this patch adds a warning in case gdb is attached multiple times and the first attachement event has not been processed yet.
2016-10-04	gpu-compute: Added method to compute the actual workgroup size	Alexandru Dutu
	This patch adds a method to the Wavefront class to compute the actual workgroup size. This can be different from the maximum workgroup size specified when launching the kernel through the NDRange object. Current solution is still not optimal, as we are computing these for each wavefront and the dispatcher also needs to have this information and can't actually call Wavefront::computeActuallWgSz before the wavefronts are being created. A long term solution would be to have a Workgroup class that deals with all these details.
2016-10-04	config: Fix lat_mem_rd example script	Andreas Hansson
	Adjust the traffic generator time-out so that the script works out of the box Change-Id: I6b3b6b11f98b094ae3acdbe09488c26e4aeb0ab4 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>