summaryrefslogtreecommitdiff
path: root/src/arch
AgeCommit message (Collapse)Author
2015-02-11mem: Clarification of packet crossbar timingsMarco Balboni
This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
2015-02-11sim: Move the BaseTLB to src/arch/generic/Andreas Sandberg
The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-01-25arm: always set the IsFirstMicroop flagAli Saidi
While the IsFirstMicroop flag exists it was only occasionally used in the ARM instructions that gem5 microOps and therefore couldn't be relied on to be correct.
2015-01-25cpu: Remove all notion that we know when the cpu is misspeculating.Ali Saidi
We have no way of knowing if a CPU model is on the wrong path with our execute-in-execute CPU models. Don't pretend that we do.
2015-01-25cpu: Put all CPU instruction tracers in a single fileAli Saidi
2015-01-22mem: Remove unused Packet src and dest fieldsAndreas Hansson
This patch takes the final step in removing the src and dest fields in the packet. These fields were rather confusing in that they only remember a single multiplexing component, and pushed the responsibility to the bridge and caches to store the fields in a senderstate, thus effectively creating a stack. With the recent changes to the crossbar response routing the crossbar is now responsible without relying on the packet fields. Thus, these variables are now unused and can be removed.
2015-01-22x86: Delay X86 table walk on receiving walker responseAndreas Hansson
This patch fixes a minor issue in the X86 page table walker where it ended up sending new request packets to the crossbar before the response processing was finished (recvTimingResp is directly calling sendTimingReq). Under certain conditions this caused the crossbar to see illegal combinations of request/response overlap, in turn causing problems with a slightly modified crossbar implementation.
2015-01-22mem: Clean up Request initialisationAndreas Hansson
This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor.
2015-01-10x86 : fxsave and fxrestore missing template codeEmilio Castillo
This patch corrects the FXSAVE and FXRSTOR Macroops. The actual code used for saving/restore the FP registers is in the file but it was not used. The FXSAVE and FXRSTOR instructions are used in the kernel for saving and loading the state of the mmx,xmm and fpu registers. This operation is triggered in FS by issuing a Device Not Available Fault. The cr0 register has a TS flag that is set upon each context change. Every time a task access any FP related register (SIMD as well) if the TS flag is set to one, the device not available fault is issued. The kernel saves the current state of the registers, and restore the previous state of the currently running task. Right now Gem5 lacks of this capability. the Device Not Available Fault is never issued, leading to several problems when different threads share the same CPU and SMT is not used. The PARSEC Ferret benchmark is an example of this behavior. In order to test this a hack in the atomic cpu code was done to detect if a static instruction has any FP operands and the cr0 reg TS bit is set. This check must be done in the ISA dependent code. But it seems to be tricky to access the cr0 register while executing an instruction. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-06x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.Gabe Black
These are for the monitor/mwait instructions, SSSE3, and XSAVE.
2015-01-06cpuid, x86: Revert "Enabling more features in CPUid"Gabe Black
That change enables CPUID bits for features that aren't implemented in gem5. If a simulated system tries to use those features because it was told it could, bad things can happen.
2015-01-03arm: Add unlinkat syscall implementationmike upton
added ARM aarch64 unlinkat syscall support, modeled on other <xxx>at syscalls. This gets all of the cpu2006 int workloads passing in SE mode on aarch64. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-01-03x86: implements the simd128 ADDSUBPD instructionMaxime Martinasso
This patch implements the simd128 ADDSUBPD instruction for the x86 architecture. Tested with a simple program in assembly language which executes the instruction. Checked that different versions of the instruction are executed by using the execution tracing option. Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-12-23arm: Add stats to table walkerCurtis Dunham
This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion
2014-12-23arm: Raise an alignment fault if a PC has illegal alignmentAndreas Sandberg
We currently don't handle unaligned PCs correctly. There is one check for unaligned PCs in the TLB when running in aarch64 mode, but this check does not cover cases where the CPU does not do a TLB lookup when decoding an instruction (e.g., a branch stays within the same cache line). Additionally, the Decoder class sometimes throws an assertion for unaligned PCs which breaks speculation. This changeset introduces a decoder fault bit field in the ExtMachInst structure. This field can be used to signal a decoder failure. If set, the decoder generates an internal gem5fault instruction instead of a normal instruction. This instruction in turns either panics (fault type PANIC), returns an PCAlignmentFault (fault type UNALIGNED, aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32). The patch causes minor changes to the realview64 regressions, and a stats bump will follow.
2014-12-23arm: Clean up and document decoder APIAndreas Sandberg
This changeset adds more documentation to the ArmISA::Decoder class and restructures it slightly to make API groups more obvious.
2014-12-23arm: Add support for filtering in the PMUAndreas Sandberg
This patch adds support for filtering events in the PMU. In order to do so, it updates the ISADevice base class to forward an ISA pointer to ISA devices. This enables such devices to access the MiscReg file to determine the current execution level.
2014-12-08arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0Andreas Sandberg
The aarch64 system register decoder is currently not decoding PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the decoder so that they are decoded using the values in table C5-6 in ARM DDI 0478A.c.
2014-12-05misc: Generalize GDB single stepping.Gabe Black
The new single stepping implementation for x86 doesn't rely on any ISA specific properties or functionality. This change pulls out the per ISA implementation of those functions and promotes the X86 implementation to the base class. One drawback of that implementation is that the CPU might stop on an instruction twice if it's affected by both breakpoints and single stepping. While that might be a little surprising, it's harmless and would only happen under somewhat unlikely circumstances.
2014-12-05x86: Implement a remote GDB stub.Gabe Black
This stub should allow remote debugging of 32 bit and 64 bit targets. Single stepping seems to work, as do breakpoints. If both breakpoints and single stepping affect an instruction, gdb will stop at the instruction twice before continuing. That's a little surprising, but is generally harmless.
2014-12-05misc: Make the GDB register cache accessible in various sized chunks.Gabe Black
Not all ISAs have 64 bit sized registers, so it's not always very convenient to access the GDB register cache in 64 bit sized chunks. This change makes it accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations were working around that limitation by bundling and unbundling 32 bit values into 64 bit values. That code has been removed.
2014-12-04x86: Rework opcode parsing to support 3 byte opcodes properly.Gabe Black
Instead of counting the number of opcode bytes in an instruction and recording each byte before the actual opcode, we can represent the path we took to get to the actual opcode byte by using a type code. That has a couple of advantages. First, we can disambiguate the properties of opcodes of the same length which have different properties. Second, it reduces the amount of data stored in an ExtMachInst, making them slightly easier/faster to create and process. This also adds some flexibility as far as how different types of opcodes are handled, which might come in handy if we decide to support VEX or XOP instructions. This change also adds tables to support properly decoding 3 byte opcodes. Before we would fall off the end of some arrays, on top of the ambiguity described above. This change doesn't measureably affect performance on the twolf benchmark. --HG-- rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
2014-12-04arch: Allow named constants as decode case values.Gabe Black
The values in a "bitfield" or in an ExtMachInst structure member may not be a literal value, it might select from an arbitrary collection of options. Instead of using the raw value of those constants in the decoder, it's easier to tell what's going on if they can be referred to as a symbolic constant/enum. To support that, the ISA description language is extended slightly so that in addition to integer literals, the case value for decode blobs can also be a string literal. It's up to the ISA author to ensure that the string evaluates to a legal constant value when interpretted as C++.
2014-12-02x86: Clean up style in process.cc.Gabe Black
2014-12-02arm: Fix TLB ignoring faults when table walkingAndrew Bardsley
This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.
2014-12-02cpu: Always mask the snoop address when performing lock checkAndreas Hansson
Ensure the snoop address check is always using a cache-block aligned address. This patch updates Alpha and Mips to match the other ISAs.
2014-11-23mem: Page Table map api modificationAlexandru Dutu
This patch adds uncacheable/cacheable and read-only/read-write attributes to the map method of PageTableBase. It also modifies the constructor of TlbEntry structs for all architectures to consider the new attributes.
2014-11-23x86: Segment initialization to support KvmCPU in SEAlexandru Dutu
This patch sets up low and high privilege code and data segments and places them in the following order: cs low, ds low, ds, cs, in the GDT. Additionally, a syscall and page fault handler for KvmCPU in SE mode are defined. The order of the segment selectors in GDT is required in this manner for interrupt handling to work properly. Segment initialization is done for all the thread contexts.
2014-11-23kvm, x86: Adding support for SE mode executionAlexandru Dutu
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode.
2014-11-23cpuid, x86: Enabling more features in CPUidAlexandru Dutu
Adding more features in the CPUid with the purpose of supporting running the KvmCPU in SE mode.
2014-11-17x86: Fix setting segment bases in real mode.Gabe Black
The data size used for actually writing the base value for the segment was the default size, but really it should set the entire value without any possible truncation.
2014-11-17x86: Fix some bugs in the real mode far jmp instruction.Gabe Black
The far pointer should be shifted right to get the selector value, not left. Also, when calculating the width of the offset, the wrong register was used in one spot.
2014-11-17x86: APIC: Only set deliveryStatus if our IPI is going somewhere.Gabe Black
Otherwise the IPI which isn't sent will never arrive, and the deliveryStatus bit will never be cleared.
2014-11-17x86: APIC: Fix the getRegArrayBit function.Gabe Black
The getRegArrayBit function extracts a bit from a series of registers which are treated as a single large bit array. A previous change had modified the logic which figured out which bit to extract from ">> 5" to "% 5" which seems wrong, especially when other, similar functions were changed to use "% 32".
2014-11-16x86: Fix the CPUID Long Mode Address Size function.Gabe Black
The value in EAX has an 8 bit field for the linear address size and one for the physical address size when calling that function. A recent change implemented it but returned 0xff for both of those fields. That implies that linear and physical addresses are 255 bits wide which is wrong. When using the KVM CPU model this causes an error, presumably because some of those bits are actually reserved, or the CPU or kernel realizes 255 bits is a bad value. This change makes those values 48.
2014-11-14arm: Fixes based on UBSan and static analysisAndreas Hansson
Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base. Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code.
2014-11-06x86 isa: This patch attempts an implementation at mwait.Marc Orr
Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-10-29automated mergeAli Saidi
2014-10-29arm, tests: Update config files to more recent kernels and create 64-bit ↵Ali Saidi
regressions. This changes the default ARM system to a Versatile Express-like system that supports 2GB of memory and PCI devices and updates the default kernels/file-systems for AArch64 ARM systems (64-bit) to support up to 32GB of memory and PCI devices. Some platforms that are no longer supported have been pruned from the configuration files. In addition a set of 64-bit ARM regressions have been added to the regression system.
2014-10-29arm, mem: Fix drain bug and provide drain prints for more components.Ali Saidi
2014-10-29arm: Fix multi-system AArch64 boot w/caches.Ali Saidi
Automatically extract cpu release address from DTB file. Check SCTLR_EL1 to verify all caches are enabled.
2014-10-29arm: Mark some miscregs (timer counter) registers at unverifiable.Ali Saidi
The checker can't verify timer registers, so it should just grab the version from the executing CPU, otherwise it could get a larger value and diverge execution.
2014-10-22sim: revert 6709bbcf564dNilay Vaish
The identifier SYS_getdents is not available on Mac OS X. Therefore, its use results in compilation failure. It seems there is no straight forward way to implement the system call getdents using readdir() or similar C functions. Hence the commit 6709bbcf564d is being rolled back.
2014-10-20x86: Fixes to avoid LTO warningsAndreas Hansson
This patch fixes a few minor issues that caused link-time warnings when using LTO, mainly for x86. The most important change is how the syscall array is created. Previously gcc and clang would complain that the declaration and definition types did not match. The organisation is now changed to match how it is done for ARM, moving the code that was previously in syscalls.cc into process.cc, and having a class variable pointing to the static array. With these changes, there are no longer any warnings using gcc 4.6.3 with LTO.
2014-10-20sim: implement getdents/getdents64 in user modeMichael Adler
Has been tested only for alpha. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-10-16arch: Use shared_ptr for all FaultsAndreas Hansson
This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".
2014-10-16arch,x86,mem: Dynamically determine the ISA for Ruby store checkAndreas Hansson
This patch makes the memory system ISA-agnostic by enabling the Ruby Sequencer to dynamically determine if it has to do a store check. To enable this check, the ISA is encoded as an enum, and the system is able to provide the ISA to the Sequencer at run time. --HG-- rename : src/arch/x86/insts/microldstop.hh => src/arch/x86/ldstflags.hh
2014-10-16arm: Add helper methods to setup architected PMU eventsAndreas Sandberg
2014-10-16arm: Add TLB PMU probesAndreas Sandberg
This changeset adds probe points that can be used to implement PMU counters for TLB stats. The following probes are supported: * ArmISA::TLB::ppRefills / TLB Refills (TLB insertions)
2014-10-16arm: Add a model of an ARM PMUv3Andreas Sandberg
This class implements a subset of the ARM PMU v3 specification as described in the ARMv8 reference manual. It supports most of the features of the PMU, however the following features are known to be missing: * Event filtering (e.g., from different privilege levels). * Access controls (the PMU currently ignores the execution level). * The chain counter (event no. 0x1E) is unimplemented. The PMU itself does not implement any events, it merely provides an interface for the configuration scripts to hook up probes that drive events. Configuration scripts should call addEventProbe() to configure custom events or high-level methods to configure architected events. The Python implementation of addEventProbe() automatically delays event type registration until after instantiation. In order to support CPU switching and some combined counters (e.g., memory references synthesized from loads and stores), the PMU allows multiple probes per event type. When creating a system that switches between CPU models that share the same PMU, PMU events for all of the CPU models can be registered with the PMU. Kudos to Matt Horsnell for the initial gem5 implementation of the PMU.