gem5 - gem5

Age	Commit message (Collapse)	Author
2016-11-09	style: [patch 3/22] reduce include dependencies in some headers	Brandon Potter
	Used cppclean to help identify useless includes and removed them. This involved erroneously included headers, but also cases where forward declarations could have been used rather than a full include.
2016-12-02	hsail: remove the panic guarding function directives	Brandon Potter
	HSA functions calls are still not supported properly with HSAIL, but the recent AMP runtime modifications rely on being able to parse the BRIG/HSAIL files that are extracted from the application binaries. We need to parse the function call HSAIL definitions, but we do not actually need to make the function calls. The reason that this happens is that HCC appends a set of routines to every HSAIL binary that it creates. These extra, unnecessary routines exist in the HCC source as a file; this file is cat'd onto everything that the compiler outputs before being assembled into the application's binary. HCC does this because it might call these helper functions. However, it doesn't actually appear to do so in the AMP codes so we just parse these functions with the HSAIL parser and then ignore them.
2016-11-21	gpu-compute: fix segfault when constructing GPUExecContext	Tony Gutierrez
	the GPUExecContext context currently stores a reference to its parent WF's GPUISA object, however there are some special instructions that do not have an associated WF. when these objects are constructed they set their WF pointer to null, which causes the GPUExecContext to segfault when trying to dereference the WF pointer to get at the WF's GPUISA object. here we change the GPUISA reference in the GPUExecContext class to a pointer so that it may be set to null.
2016-11-21	gpu-compute: init valid field of GpuTlbEntry in default ctor	Tony Gutierrez
	valid field for GpuTlbEntry is not set in the default ctor, which can lead to strange behavior, and is also flagged by UBSAN.
2016-10-26	hsail,gpu-compute: fixes to appease clang++	Tony Gutierrez
	fixes to appease clang++. tested on: Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final) (based on LLVM 3.5.0) Ubuntu clang version 3.6.0-2ubuntu1~trusty1 (tags/RELEASE_360/final) (based on LLVM 3.6.0) the fixes address the following five issues: 1) the exec continuations in gpu_static_inst.hh were marked as protected when they should be public. here we mark them as public 2) the Abs instruction uses std::abs() in its execute method. because Abs is templated, it can also operate on U32 and U64, types, which cause Abs::execute() to pass uint32_t and uint64_t types to std::abs() respectively. this triggers a warning because std::abs() has no effect in this case. to rememdy this we add template specialization for the execute() method of Abs when its template paramter is U32 or U64. 3) Some potocols that utilize the code in cprintf.hh were missing includes to BoolVec.hh, which defines operator<< for the BoolVec type. This would cause issues when the generated code would try to pass a BoolVec type to a method in cprintf.hh that used operator<< on an instance of a BoolVec. 4) Surprise, clang doesn't like it when you clobber all the bits in a newly allocated object. I.e., this code: tlb = new GpuTlbEntry\[size\]; std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size); Let's use std::vector to track the TLB entries in the GpuTlb now... 5) There were a few variables used only in DPRINTFs, so we mark them with M5_VAR_USED.
2016-10-26	gpu-compute: support in-order data delivery in GM pipe	Tony Gutierrez
	this patch adds an ordered response buffer to the GM pipeline to ensure in-order data delivery. the buffer is implemented as a stl ordered map, which sorts the request in program order by using their sequence ID. when requests return to the GM pipeline they are marked as done. only the oldest request may be serviced from the ordered buffer, and only if is marked as done. the FIFO response buffers are kept and used in OoO delivery mode
2016-10-26	gpu-compute, hsail: pass GPUDynInstPtr to getRegisterIndex()	Tony Gutierrez
	for HSAIL an operand's indices into the register files may be calculated trivially, because the operands are always read from a register file, or are an immediate. for machine ISA, however, an op selector may specify special registers, or may specify special SGPRs with an alias op selector value. the location of some of the special registers values are dependent on the size of the RF in some cases. here we add a way for the underlying getRegisterIndex() method to know about the size of the RFs, so that it may find the relative positions of the special register values.
2016-10-26	gpu-compute: use System cache line size in the GPU	Tony Gutierrez

2016-10-26	gpu-compute, hsail: make the PC a byte address, not an instruction index	Tony Gutierrez
	currently the PC is incremented on an instruction granularity, and not as an instruction's byte address. machine ISA instructions assume the PC is a byte address, and is incremented accordingly. here we make the GPU model, and the HSAIL instructions treat the PC as a byte address as well.
2016-10-26	gpu-compute: add gpu_isa.hh to switch hdrs, add GPUISA to WF	Tony Gutierrez
	the GPUISA class is meant to encapsulate any ISA-specific behavior - special register accesses, isa-specific WF/kernel state, etc. - in a generic enough way so that it may be used in ISA-agnostic code. gpu-compute: use the GPUISA object to advance the PC the GPU model treats the PC as a pointer to individual instruction objects - which are store in a contiguous array - and not a byte address to be fetched from the real memory system. this is ok for HSAIL because all instructions are considered by the model to be the same size. in machine ISA, however, instructions may be 32b or 64b, and branches are calculated by advancing the PC by the number of words (4 byte chunks) it needs to advance in the real instruction stream. because of this there is a mismatch between the PC we use to index into the instruction array, and the actual byte address PC the ISA expects. here we move the PC advance calculation to the ISA so that differences in the instrucion sizes may be accounted for in generic way.
2016-10-26	gpu-compute: add instruction mix stats for the gpu	Tony Gutierrez

2016-10-26	gpu-compute, hsail: call discardFetch() from the WF	Tony Gutierrez
	because every taken branch causes fetch to be discarded, we move the call to the WF to avoid to have to call it from each and every branch instruction type.
2016-10-26	hsail, gpu-compute: remove doGm/SmReturn add completeAcc	Tony Gutierrez
	we are removing doGmReturn from the GM pipe, and adding completeAcc() implementations for the HSAIL mem ops. the behavior in doGmReturn is dependent on HSAIL and HSAIL mem ops, however the completion phase of memory ops in machine ISA can be very different, even amongst individual machine ISA mem ops. so we remove this functionality from the pipeline and allow it to be implemented by the individual instructions.
2016-10-26	gpu-compute: remove inst enums and use bit flag for attributes	Tony Gutierrez
	this patch removes the GPUStaticInst enums that were defined in GPU.py. instead, a simple set of attribute flags that can be set in the base instruction class are used. this will help unify the attributes of HSAIL and machine ISA instructions within the model itself. because the static instrution now carries the attributes, a GPUDynInst must carry a pointer to a valid GPUStaticInst so a new static kernel launch instruction is added, which carries the attributes needed to perform a the kernel launch.
2016-10-26	gpu-compute: move disassemle() implementation to GPUStaticInst	Tony Gutierrez

2016-10-26	gpu-compute, arch: add some methods to the base inst classes for ISA support	Tony Gutierrez

2016-10-04	gpu-compute: Added method to compute the actual workgroup size	Alexandru Dutu
	This patch adds a method to the Wavefront class to compute the actual workgroup size. This can be different from the maximum workgroup size specified when launching the kernel through the NDRange object. Current solution is still not optimal, as we are computing these for each wavefront and the dispatcher also needs to have this information and can't actually call Wavefront::computeActuallWgSz before the wavefronts are being created. A long term solution would be to have a Workgroup class that deals with all these details.
2016-09-16	gpu-compute: fix typo in GPUDispatcher	Tony Gutierrez

2016-09-16	gpu-compute: Adding context serialization methods to Wavefront	Alexandru Dutu
	This patch adds methods to serialize the context of a particular wavefront to the simulated system memory. Context serialization is used when a wavefront is preempeted (i.e. context switch).
2016-09-16	gpu-compute: Refactoring Wavefront::dynWaveId	Alexandru Dutu

2016-09-16	gpu-compute: Adding vector register file debug messages	Alexandru Dutu
	This patch introduces DPRINTFs for reading and writing to and from the vector register file.
2016-09-16	gpu-compute: Changing reconvergenceStack type	Alexandru Dutu
	std::stack has no iterators, therefore the reconvergence stack can't be iterated without poping elements off. We will be using std::list instead to be able to iterate for saving and restoring purposes.
2016-09-16	gpu-compute: Adding ioctl for HW context size	Alexandru Dutu
	Adding runtime support for determining the memory required by a SIMD engine when executing a particular wavefront.
2016-09-16	gpu-compute: Wavefront refactoring	Alexandru Dutu
	Renaming members of the Wavefront class in accordance with the style guide.
2016-09-16	gpu-compute: Remove WFContext	Alexandru Dutu
	WFContext struct is currently unused and it has been rendered not useful in saving and restoring the context of a Wavefront. Wavefront class should be sufficient for that purpose and the runtime can figure out the memory size it will need to allocate for a Wavefront through an IOCTL.
2016-09-13	gpu-compute: Fix bug with return in cfg	Michael LeBeane
	Connecting basic blocks would stop too early in kernels where ret was not the last instruction. This patch allows basic blocks after the ret instruction to be properly connected.
2016-06-09	gpu-compute: parametrize Wavefront size	jkalamat
	Eliminate the VSZ constant that defined the Wavefront size (in numbers of work items); replaced it with a parameter in the GPU.py configuration script. Changed all data structures dependent on the Wavefront size to be dynamically sized. Legal values of Wavefront size are 16, 32, 64 for now and checked at initialization time.
2016-06-06	stats: Fixing regStats function for some SimObjects	David Guillen Fandos
	Fixing an issue with regStats not calling the parent class method for most SimObjects in Gem5. This causes issues if one adds new stats in the base class (since they are never initialized properly!). Change-Id: Iebc5aa66f58816ef4295dc8e48a357558d76a77c Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-06-03	gpu-compute: Fixed a bug in global memory pipeline	Tuan Ta
	Added a condition when inflightStores is incremented to prevent a deadlock caused by many memory fence requests generated by a CU
2016-05-16	gpu-compute: fix bug in GPUDynInst::isScalarRegister()	Tony Gutierrez

2016-05-06	gpu-compute: fix spacing in GPUDynInst ctor	Tony Gutierrez

2016-05-06	gpu-compute: fix uninitialized member bug in GPUDynInst	Tony Gutierrez
	the n_reg field in the GPUDynInst is not currently set in the constructor. if it is not set externally, there are assertion failures that may occur if the random value it gets is just right. here we set it to 0 by default.
2016-04-07	mem: Remove threadId from memory request class	Mitch Hayenga
	In general, the ThreadID parameter is unnecessary in the memory system as the ContextID is what is used for the purposes of locks/wakeups. Since we allocate sequential ContextIDs for each thread on MT-enabled CPUs, ThreadID is unnecessary as the CPUs can identify the requesting thread through sideband info (SenderState / LSQ entries) or ContextID offset from the base ContextID for a cpu. This is a re-spin of 20264eb after the revert (bd1c6789) and includes some fixes of that commit.
2016-03-21	gpu-compute: remove unused variable from scoreboard check stage	jkalamat
	appease clang by removing the unused private member variable, 'numGlbMemPipes', from the scoreboard check stage
2016-03-17	syscall_emul: move mmapGrowsDown() to LiveProcess	Steve Reinhardt
	The mmapGrowsDown() method was a static method on the OperatingSystem class (and derived classes), which worked OK for the templated syscall emulation methods, but made it hard to access elsewhere. This patch moves the method to be a virtual function on the LiveProcess method, where it can be overridden for specific platforms (for now, Alpha). This patch also changes the value of mmapGrowsDown() from being false by default and true only on X86Linux32 to being true by default and false only on Alpha, which seems closer to reality (though in reality most people use ASLR and this doesn't really matter anymore). In the process, also got rid of the unused mmap_start field on LiveProcess and OperatingSystem mmapGrowsUp variable.
2016-03-04	base: Fix gpu-compute output stream creation	Andreas Hansson
	Match changes in output stream.
2016-02-18	gpu: fix bugs with MemFence, Flat Instrs and Resource utilization	John Kalamatianos
	Both Memory Fence is now flagged as Global Memory only to avoid resource oversubscribing. Flat instructions now check for Shared Memory resource busy to avoid oversubscribing resources. All WaitClass resources now use cycles (not ticks) to register the number of pipe stages between Scoreboard and Execute to be consistent with instruction scheduling logic which always used clock cycles.
2016-02-17	gpu-compute: remove brig_object.hh from hsa_object.cc	Tony Gutierrez
	brig_object.hh is specific to the HSAIL ISA, and hence should not be included in ISA-agnostic code.
2016-01-19	gpu-compute: AMD's baseline GPU model	Tony Gutierrez