summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-12-23stats: Bump stats for decoder, TLB, prefetcher and DRAM changesAndreas Hansson
Changes due to speculative execution of an unaligned PC, introduction of TLB stats, changes and re-work of the prefetcher, and the introduction of rank-wise refresh in the DRAM controller.
2014-12-23mem: Change prefetcher to use random_mtMitch Hayenga
Prefechers has used rand() to generate random numers previously.
2014-12-23mem: Hide WriteInvalidate requests from prefetchersCurtis Dunham
Without this tweak, a prefetcher will happily prefetch data that will promptly be invalidated and overwritten by a WriteInvalidate.
2014-12-23mem: Fix event scheduling issue for prefetchesMitch Hayenga
The cache's MemSidePacketQueue schedules a sendEvent based upon nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever a future prefetch is ready. However, a prefetch being ready does not guarentee that it can obtain an MSHR. So, when all MSHRs are full, the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond until an MSHR is finally freed and the prefetch can happen. This patch fixes this by not signaling the prefetch ready time if the prefetch could not be generated. The event is rescheduled as soon as a MSHR becomes available.
2014-12-23mem: Fix bug relating to writebacks and prefetchesMitch Hayenga
Previously the code commented about an unhandled case where it might be possible for a writeback to arrive after a prefetch was generated but before it was sent to the memory system. I hit that case. Luckily the prefetchSquash() logic already in the code handles dropping prefetch request in certian circumstances.
2014-12-23mem: Rework the structuring of the prefetchersMitch Hayenga
Re-organizes the prefetcher class structure. Previously the BasePrefetcher forced multiple assumptions on the prefetchers that inherited from it. This patch makes the BasePrefetcher class truly representative of base functionality. For example, the base class no longer enforces FIFO order. Instead, prefetchers with FIFO requests (like the existing stride and tagged prefetchers) now inherit from a new QueuedPrefetcher base class. Finally, the stride-based prefetcher now assumes a custimizable lookup table (sets/ways) rather than the previous fully associative structure.
2014-12-23mem: Add parameter to reserve MSHR entries for demand accessMitch Hayenga
Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall.
2014-12-23arm: Add stats to table walkerCurtis Dunham
This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion
2014-12-23config: Expose the DRAM ranks as a command-line optionAndreas Hansson
This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels). The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding).
2014-12-23mem: Ensure DRAM controller is idle when in atomic modeAndreas Hansson
This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance. The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions. The switcheroo regression require a minor stats bump as a result.
2014-12-23mem: Add rank-wise refresh to the DRAM controllerOmar Naji
This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank. Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy. The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank.
2014-12-23mem: Fix a bug in the DRAM controller arbitrationOmar Naji
Fix a minor issue that affects multi-rank systems.
2014-12-23tests: Add a regression for the stack distance calculatorAndreas Hansson
Re-use the existing traffic generator regression, and enable the stack distance calculation in the comm monitor, along with the verification stack. The traffic generator config is also tuned to not increase the run-time too much (and actually have some address re-use).
2014-12-23mem: Add stack distance statistics to the CommMonitorKanishk Sugand
This patch adds the stack distance calculator to the CommMonitor. The stats are disabled by default.
2014-12-23mem: Add a stack distance calculatorKanishk Sugand
This patch adds a stand-alone stack distance calculator. The stack distance calculator is a passive SimObject that observes the addresses passed to it. It calculates stack distances (LRU Distances) of incoming addresses based on the partial sum hierarchy tree algorithm described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043. For each transaction a hashtable look-up is performed. At every non-unique transaction the tree is traversed from the leaf at the returned index to the root, the old node is deleted from the tree, and the sums (to the right) are collected and decremented. The collected sum represets the stack distance of the found node. At every unique transaction the stack distance is returned as numeric_limits<uint64>::max(). In addition to the basic stack distance calculation, a feature to mark an old node in the tree is added. This is useful if it is required to see the reuse pattern. For example, Writebacks to the lower level (e.g. membus from L2), can be marked instead of being removed from the stack (isMarked flag of Node set to True). And then later if this same address is accessed (by L1), the value of the isMarked flag would be True. This gives some insight on how the Writeback policy of the lower level affect the read/write accesses in an application. Debugging is enabled by setting the verify flag to true. Debugging is implemented using a dummy stack that behaves in a naive way, using STL vectors. Note that this has a large impact on run time.
2014-12-23config: Add --memchecker optionMarco Elver
This patch adds the --memchecker option, to denote that a MemChecker should be instantiated for the system. The exact usage of the MemChecker depends on the system configuration. For now CacheConfig.py makes use of the option, adding MemCheckerMonitor instances between CPUs and D-Caches. Note, however, that currently this only provides limited checking on a running system; other parts of the system, such as I/O devices are not monitored, and may cause warnings to be issued by the monitor.
2014-12-23mem: Add MemChecker and MemCheckerMonitorMarco Elver
This patch adds the MemChecker and MemCheckerMonitor classes. While MemChecker can be integrated anywhere in the system and is independent, the most convenient usage is through the MemCheckerMonitor -- this however, puts limitations on where the MemChecker is able to observe read/write transactions.
2014-12-23arm: Raise an alignment fault if a PC has illegal alignmentAndreas Sandberg
We currently don't handle unaligned PCs correctly. There is one check for unaligned PCs in the TLB when running in aarch64 mode, but this check does not cover cases where the CPU does not do a TLB lookup when decoding an instruction (e.g., a branch stays within the same cache line). Additionally, the Decoder class sometimes throws an assertion for unaligned PCs which breaks speculation. This changeset introduces a decoder fault bit field in the ExtMachInst structure. This field can be used to signal a decoder failure. If set, the decoder generates an internal gem5fault instruction instead of a normal instruction. This instruction in turns either panics (fault type PANIC), returns an PCAlignmentFault (fault type UNALIGNED, aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32). The patch causes minor changes to the realview64 regressions, and a stats bump will follow.
2014-12-23arm: Clean up and document decoder APIAndreas Sandberg
This changeset adds more documentation to the ArmISA::Decoder class and restructures it slightly to make API groups more obvious.
2014-12-23arm: Add support for filtering in the PMUAndreas Sandberg
This patch adds support for filtering events in the PMU. In order to do so, it updates the ISADevice base class to forward an ISA pointer to ISA devices. This enables such devices to access the MiscReg file to determine the current execution level.
2014-12-23config: Add options to take/resume from SimPoint checkpointsDam Sunwoo
More documentation at http://gem5.org/Simpoints Steps to profile, generate, and use SimPoints with gem5: 1. To profile workload and generate SimPoint BBV file, use the following option: --simpoint-profile --simpoint-interval <interval length> Requires single Atomic CPU and fastmem. <interval length> is in number of instructions. 2. Generate SimPoint analysis using SimPoint 3.2 from UCSD. (SimPoint 3.2 not included with this flow.) 3. To take gem5 checkpoints based on SimPoint analysis, use the following option: --take-simpoint-checkpoint=<simpoint file path>,<weight file path>,<interval length>,<warmup length> <simpoint file> and <weight file> is generated by SimPoint analysis tool from UCSD. SimPoint 3.2 format expected. <interval length> and <warmup length> are in number of instructions. 4. To resume from gem5 SimPoint checkpoints, use the following option: --restore-simpoint-checkpoint -r <N> --checkpoint-dir <simpoint checkpoint path> <N> is (SimPoint index + 1). E.g., "-r 1" will resume from SimPoint #0.
2014-12-22scons: Make the USE_KVM variable available in C++.Gabe Black
We need it to determine whether we should expect KVM related parameters exist in the cirrus graphics device.
2014-12-14Added tag stable_2014_12_14 for changeset bdb307e8be54Nilay Vaish
2014-12-09Let other objects set up memory like regions in a KVM VM.Gabe Black
2014-12-08arm: Fix decoding of PMXEVTYPER_EL0 and PMCCFILTR_EL0Andreas Sandberg
The aarch64 system register decoder is currently not decoding PMXEVTYPER_EL0 and PMCCFILTR_EL0 correctly. This changeset updates the decoder so that they are decoded using the values in table C5-6 in ARM DDI 0478A.c.
2014-12-08dev: Add response sanity checks in PioPortAndreas Sandberg
Add an assert in the PioPort that checks if a response packet from a device has the right flags set before passing it to them rest of the memory system.
2014-12-08dev: Correctly transform packets into responsesAndreas Sandberg
The VirtIO devices didn't correctly set the response flags in memory packets. This changeset adds the required Packet::makeResponse() calls.
2014-12-05misc: Generalize GDB single stepping.Gabe Black
The new single stepping implementation for x86 doesn't rely on any ISA specific properties or functionality. This change pulls out the per ISA implementation of those functions and promotes the X86 implementation to the base class. One drawback of that implementation is that the CPU might stop on an instruction twice if it's affected by both breakpoints and single stepping. While that might be a little surprising, it's harmless and would only happen under somewhat unlikely circumstances.
2014-12-05x86: Implement a remote GDB stub.Gabe Black
This stub should allow remote debugging of 32 bit and 64 bit targets. Single stepping seems to work, as do breakpoints. If both breakpoints and single stepping affect an instruction, gdb will stop at the instruction twice before continuing. That's a little surprising, but is generally harmless.
2014-12-05misc: Add some utility functions for schedule inst commit events.Gabe Black
These can be used to simplify the implementation of single step in derived classes.
2014-12-05misc: Rename the GDB "Event" event class to InputEvent.Gabe Black
The "Event" name is the same as the base event class. That's a bit confusing, and makes it a little awkward to add other event types.
2014-12-05sim: Ensure GDB interrupts the simulation at an instruction boundary.Gabe Black
Use the comInstEventQueue to ensure GDB interrupts the simulation at an instruction boundary and not in the middle of a macroop, memory access, etc.
2014-12-05cpu: Only check for PC events on instruction boundaries.Gabe Black
Only the instruction address is actually checked, so there's no need to check repeatedly while we're working through the microops of a macroop and that's not changing.
2014-12-05misc: Make the GDB register cache accessible in various sized chunks.Gabe Black
Not all ISAs have 64 bit sized registers, so it's not always very convenient to access the GDB register cache in 64 bit sized chunks. This change makes it accessible in 8, 16, 32, or 64 bit chunks. The MIPS and ARM implementations were working around that limitation by bundling and unbundling 32 bit values into 64 bit values. That code has been removed.
2014-12-04config: Add two options for setting the kernel command line.Gabe Black
Both options accept template which will, through python string formatting, have "mem", "disk", and "script" values substituted in from the mdesc. Additional values can be used on a case by case basis by passing them as keyword arguments to the fillInCmdLine function. That makes it possible to have specialized parameters for a particular ISA, for instance. The first option lets you specify the template directly, and the other lets you specify a file which has the template in it.
2014-12-04x86: Rework opcode parsing to support 3 byte opcodes properly.Gabe Black
Instead of counting the number of opcode bytes in an instruction and recording each byte before the actual opcode, we can represent the path we took to get to the actual opcode byte by using a type code. That has a couple of advantages. First, we can disambiguate the properties of opcodes of the same length which have different properties. Second, it reduces the amount of data stored in an ExtMachInst, making them slightly easier/faster to create and process. This also adds some flexibility as far as how different types of opcodes are handled, which might come in handy if we decide to support VEX or XOP instructions. This change also adds tables to support properly decoding 3 byte opcodes. Before we would fall off the end of some arrays, on top of the ambiguity described above. This change doesn't measureably affect performance on the twolf benchmark. --HG-- rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f38_opcodes.isa rename : src/arch/x86/isa/decoder/three_byte_opcodes.isa => src/arch/x86/isa/decoder/three_byte_0f3a_opcodes.isa
2014-12-04arch: Allow named constants as decode case values.Gabe Black
The values in a "bitfield" or in an ExtMachInst structure member may not be a literal value, it might select from an arbitrary collection of options. Instead of using the raw value of those constants in the decoder, it's easier to tell what's going on if they can be referred to as a symbolic constant/enum. To support that, the ISA description language is extended slightly so that in addition to integer literals, the case value for decode blobs can also be a string literal. It's up to the ISA author to ensure that the string evaluates to a legal constant value when interpretted as C++.
2014-12-04config: ruby: mi protocol: correct master slave setting for dmaNilay Vaish
In the MI protocol, the master slave connection between the dma controller and network was being set incorrectly. This patch corrects it.
2014-12-02x86: Clean up style in process.cc.Gabe Black
2014-12-03sim: Make it possible to override the breakpoint length check.Gabe Black
The check which makes sure the length of the breakpoint being written is the same as a MachInst is only correct on fixed instruction width ISAs. Instead of incorrectly applying that check to all ISAs, this change makes that the default check and lets ISA specific GDB classes override it.
2014-12-03config: Get rid of some extra spaces around default arguments.Gabe Black
2014-12-03ide: Accept the IDLE (0xe3) ATA command.Gabe Black
This command is supposed to set up a timer which will put the drive into a standby mode if it isn't sent a command within a given time out. Since most of the timeouts are generally significantly longer than a simulation would run anyway, and we don't have an implementation for standby mode to begin with, we can accept the command, do nothing, and report success.
2014-12-03dev: Support translating left and right ALT keys.Gabe Black
This is used primarily for VNC.
2014-12-02stats: Bump stats for fixes, mostly TLB and WriteInvalidateAndreas Hansson
2014-12-02scons: Ensure dictionary iteration is sorted by keyAndreas Hansson
This patch adds sorting based on the SimObject name or parameter name for all situations where we iterate over dictionaries. This should ensure a deterministic and consistent order across the host systems and hopefully avoid regression results differing across python versions.
2014-12-02mem: Support WriteInvalidate (again)Curtis Dunham
This patch takes a clean-slate approach to providing WriteInvalidate (write streaming, full cache line writes without first reading) support. Unlike the prior attempt, which took an aggressive approach of directly writing into the cache before handling the coherence actions, this approach follows the existing cache flows as closely as possible.
2014-12-02mem: Remove WriteInvalidate supportCurtis Dunham
Prepare for a different implementation following in the next patch
2014-12-02cpu: Fix retries on barrier/store in Minor's store bufferAndrew Bardsley
This patch fixes a case where a store in Minor's store buffer never leaves the store buffer as it is pre-maturely counted as having been issued, leading to the store buffer idling. LSQ::StoreBuffer::numUnissuedAccesses should count the number of accesses either in memory, or still in the store buffer after being completed. For stores which are also barriers, the store will stay in the store buffer for a cycle after it is completed and will be cleaned up by the barrier clearing code (to ensure that barriers are completed in-order). To acheive this, numUnissuedAccesses is not decremented when a store-barrier is issued to memory, but when its barrier effect is cleared. Without this patch, the correct behaviour happens when a memory transaction is immediately accepted, but not if it needs a retry.
2014-12-02cpu: Fix memoryIssueLimit checking in MinorAndrew Bardsley
This patch fixes the checking of the number of memory instructions issued per cycles in the Minor CPU.
2014-12-02arm: Fix TLB ignoring faults when table walkingAndrew Bardsley
This patch fixes a case where the Minor CPU can deadlock due to the lack of a response to TLB request because of a bug in fault handling in the ARM table walker. TableWalker::processWalkWrapper is the scheduler-called wrapper which handles deferred walks which calls to TableWalker::wait cannot immediately process. The handling of faults generated by processWalk{AArch64,LPAE,} calls in those two functions is is different. processWalkWrapper ignores fault returns from processWalk... which can lead to ::finish not being called on a translation. This fix provides fault handling in processWalkWrapper similar to that found in the leaf functions which BaseTLB::Translation::finish.