summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-11-06mem: Add cache clusivityAndreas Hansson
This patch adds a parameter to control the cache clusivity, that is if the cache is mostly inclusive or exclusive. At the moment there is no intention to support strict policies, and thus the options are: 1) mostly inclusive, or 2) mostly exclusive. The choice of policy guides the behaviuor on a cache fill, and a new helper function, allocOnFill, is created to encapsulate the decision making process. For the timing mode, the decision is annotated on the MSHR on sending out the downstream packet, and in atomic we directly pass the decision to handleFill. We (ab)use the tempBlock in cases where we are not allocating on fill, leaving the rest of the cache unaffected. Simple and effective. This patch also makes it more explicit that multiple caches are allowed to consider a block writable (this is the case also before this patch). That is, for a mostly inclusive cache, multiple caches upstream may also consider the block exclusive. The caches considering the block writable/exclusive all appear along the same path to memory, and from a coherency protocol point of view it works due to the fact that we always snoop upwards in zero time before querying any downstream cache. Note that this patch does not introduce clean writebacks. Thus, for clean lines we are essentially removing a cache level if it is made mostly exclusive. For example, lines from the read-only L1 instruction cache or table-walker cache are always clean, and simply get dropped rather than being passed to the L2. If the L2 is mostly exclusive and does not allocate on fill it will thus never hold the line. A follow on patch adds the clean writebacks. The patch changes the L2 of the O3_ARM_v7a CPU configuration to be mostly exclusive (and stats are affected accordingly).
2015-11-06mem: Avoid unnecessary snoops on writebacks and clean evictionsAli Jafri
This patch optimises the handling of writebacks and clean evictions when using a snoop filter. Instead of snooping into the caches to determine if the block is cached or not, simply set the status based on the snoop-filter result.
2015-11-06mem: Order packet queue only on matching addressesAndreas Hansson
Instead of conservatively enforcing order for all packets, which may negatively impact the simulated-system performance, this patch updates the packet queue such that it only applies the restriction if there are already packets with the same address in the queue. The basic need for the order enforcement is due to coherency interactions where requests/responses to the same cache line must not over-take each other. We rely on the fact that any packet that needs order enforcement will have a block-aligned address. Thus, there is no need for the queue to know about the cacheline size.
2015-11-06mem: Enforce insertion order on the cache response pathAli Jafri
This patch enforces insertion order transmission of packets on the response path in the cache. Note that the logic to enforce order is already present in the packet queue, this patch simply turns it on for queues in the response path. Without this patch, there are corner cases where a request-response is faster than a response-response forwarded through the cache. This violation of queuing order causes problems in the snoop filter leaving it with inaccurate information. This causes assert failures in the snoop filter later on. A follow on patch relaxes the order enforcement in the packet queue to limit the performance impact.
2015-11-06mem: Use the packet delays and do not just zero them outAndreas Hansson
This patch updates the I/O devices, bridge and simple memory to take the packet header and payload delay into account in their latency calculations. In all cases we add the header delay, i.e. the accumulated pipeline delay of any crossbars, and the payload delay needed for deserialisation of any payload. Due to the additional unknown latency contribution, the packet queue of the simple memory is changed to use insertion sorting based on the time stamp. Moreover, since the memory hands out exclusive (non shared) responses, we also need to ensure ordering for reads to the same address.
2015-11-06mem: Align rules for sinking inhibited packets at the slaveAndreas Hansson
This patch aligns how the memory-system slaves, i.e. the various memory controllers and the bridge, identify and deal with sinking of inhibited packets that are only useful within the coherent part of the memory system. In the future we could shift the onus to the crossbar, and add a parameter "is_point_of_coherence" that would allow it to sink the aforementioned packets.
2015-11-06mem: Do not treat CleanEvict as a write operationAndreas Hansson
This patch changes the CleanEvict command type to not be considered a write. Initially it was made a zero-sized write to match the writeback command, but as things developed it became clear that it causes more problems than it solves. For example, the memory modules (and bridge) should not consider the CleanEvict as a write, but instead discard it. With this patch it will be neither a read, nor write, and as it does not need a response the slave will simply sink it.
2015-11-06mem: Unify delayed packet deletionAndreas Hansson
This patch unifies how we deal with delayed packet deletion, where the receiving slave is responsible for deleting the packet, but the sending agent (e.g. a cache) is still relying on the pointer until the call to sendTimingReq completes. Previously we used a mix of a deletion vector and a construct using unique_ptr. With this patch we ensure all slaves use the latter approach.
2015-11-06misc: Appease clang static analyzerAndreas Hansson
A few minor fixes to issues identified by the clang static analyzer.
2015-11-06mem: Check the XBar's port queues on functional snoopsAndreas Sandberg
The CoherentXBar currently doesn't check its queued slave ports when receiving a functional snoop. This caused data corruption in cases when a modified cache lines is forwarded between two caches. Add the required functional calls into the queued slave ports.
2015-11-03mem: hmc: minor fixesErfan Azarkhish
This patch performs two minor fixes to DRAMCtrl.py and xbar.hh in favor of the HMC patch series. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03mem: hmc: serial link modelErfan Azarkhish
This changeset adds a serial link model for the Hybrid Memory Cube (HMC). SerialLink is a simple variation of the Bridge class, with the ability to account for the latency of packet serialization. Also trySendTiming has been modified to correctly model bandwidth. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-11-03mem: hmc: adds controllerErfan Azarkhish
This patch models a simple HMC Controller. It simply schedules the incoming packets to HMC Serial Links using a round robin mechanism. This patch should be applied in series with other patches modeling a complete HMC device. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-10-29arm: Add secure flag to TableWalker request when neededNathanael Premillieu
2015-10-29dev: Fix segfault in flash deviceSascha Bischoff
Fix a bug in which the flash device would write out of bounds and could either trigger a segfault and corrupt the memory of other objects. This was caused by using pageSize in the place of pagesPerBlock when running the garbage collector. Also, added an assert to flag this condition in the future.
2015-10-29dev: Fix draining for UFSHostDevice and FlashDeviceSascha Bischoff
This patch fixes the drain logic for the UFSHostDevice and the FlashDevice. In the case of the FlashDevice, the logic for CheckDrain needed to be reversed, whilst in the case of the UFSHostDevice check drain was never being called. In both cases the system would never complete draining if the initial attampt to drain failed.
2015-10-29kvm, arm: Fix compilation errors due to API changesVictor Garcia
The checkpoint changes, along with the SMT patches have changed a number of APIs. Adapt the ArmKvmCPU accordingly.
2015-10-29mem: Clarify cache MSHR handling on fillAndreas Hansson
This patch addresses the upgrading of deferred targets in the MSHR, and makes it clearer by explicitly calling out what is happening (deferred targets are promoted if we get exclusivity without asking for it).
2015-10-25power: Implement Remote GDBBoris Shingarov
2015-10-23x86: Add missing explicit overrides for X86 devicesAndreas Hansson
Make clang >= 3.5 happy when compiling build/X86/gem5.opt on OSX.
2015-10-23arm: Add missing explicit overrides for ARM devicesAndreas Hansson
Make clang >= 3.5 happy when compiling build/ARM/gem5.opt on OSX.
2015-10-14mem: Pass snoop retries through the CommMonitorAndreas Hansson
Allow the monitor to be placed after a snooping port, and do not fail on snoop retries, but instead pass them on to the slave port.
2015-10-14ruby: profiler: provide the number of vnets through ruby systemNilay Vaish
The aim is to ultimately do away with the static function Network::getNumberOfVirtualNetworks().
2015-10-14ruby: remove unused functionalRead() function.Nilay Vaish
Not required since functional reads cannot rely on messages that are inflight.
2015-10-14ruby: garnet: flexible: refactor flitNilay Vaish
2015-10-12misc: Add explicit overrides and fix other clang >= 3.5 issuesAndreas Hansson
This patch adds explicit overrides as this is now required when using "-Wall" with clang >= 3.5, the latter now part of the most recent XCode. The patch consequently removes "virtual" for those methods where "override" is added. The latter should be enough of an indication. As part of this patch, a few minor issues that clang >= 3.5 complains about are also resolved (unused methods and variables).
2015-10-12misc: Remove redundant compiler-specific definesAndreas Hansson
This patch moves away from using M5_ATTR_OVERRIDE and the m5::hashmap (and similar) abstractions, as these are no longer needed with gcc 4.7 and clang 3.1 as minimum compiler versions.
2015-10-10sim: Don't quiesce UDelayEvents with 0 latencyJoel Hestness
ARM uses UDelayEvents to emulate kernel __*udelay functions and speed up simulation. UDelayEvents call Pseudoinst::quiesceNs to quiesce the system for a specified delay. Changeset 10341:0b4d10f53c2d introduced the requirement that any quiesce process that is started must also be completed by scheduling an EndQuiesceEvent. This change causes the CPU to hang if an IsQuiesce instruction is executed, but the corresponding EndQuiesceEvent is not scheduled. Changeset 11058:d0934b57735a introduces a fix for uses of PseudoInst::quiesce* that would conditionally execute the EndQuiesceEvent. ARM UDelayEvents specify quiesce period of 0 ns (src/arch/arm/linux/system.cc), so changeset 11058 causes these events to now execute full quiesce processes, greatly increasing the total instructions executed in kernel delay loops and slowing simulation. This patch updates the UDelayEvent to conditionally execute PseudoInst::quiesceNs (**a quiesce operation**) only if the specified delay is >0 ns. The result is ARM delay loops no longer execute instructions for quiesce handling, and regression time returns to normal.
2015-10-09isa: Add parameter to pick different decoder inside ISARekai Gonzalez Alberquilla
The decoder is responsible for splitting instructions in micro operations (uops). Given that different micro architectures may split operations differently, this patch allows to specify which micro architecture each isa implements, so different cores in the system can split instructions differently, also decoupling uop splitting (microArch) from ISA (Arch). This is done making the decodification calls templates that receive a type 'DecoderFlavour' that maps the name of the operation to the class that implements it. This way there is only one selection point (converting the command line enum to the appropriate DecodeFeatures object). In addition, there is no explicit code replication: template instantiation hides that, and the compiler should be able to resolve a number of things at compile-time.
2015-10-09sim: Add relative break schedulingDylan Johnson
Add schedRelBreak() function, executable within a debugger, that sets a breakpoint by relative rather than absolute tick.
2015-10-06arch: clean up isa_parser error handlingSteve Reinhardt
Although some decent error messages were getting generated inside isa_parser.py, they weren't always getting printed because of the screwy way we were handling exceptions. (Basically an inner exception would get hidden by an outer exception, and the more informative inner error message would not get printed.) Also line numbers were messed up, since they were taken from the lexer, which is typically a token (or more) ahead of the grammar rule that's being matched. Using the 'lineno' attribute that PLY associates with the grammar production is more accurate. The new LineTracker class extends lineno to track filenames as well as line numbers.
2015-10-06sim: add ExecMacro to Exec* compound debug flagsSteve Reinhardt
Really should have been there in the first place, IMO. Makes debugging x86 execution a lot easier.
2015-10-06sim: print pid in output headerSteve Reinhardt
This information is useful if you have a bunch of simulations running and want to know which one to kill, but you've neglected to take advantage of the previous patch and embed the pid in your output path.
2015-10-06x86: implement rcpps and rcpss SSE instsSteve Reinhardt
These are packed single-precision approximate reciprocal operations, vector and scalar versions, respectively. This code was basically developed by copying the code for sqrtps and sqrtss. The mrcp micro-op was simplified relative to msqrt since there are no double-precision versions of this operation.
2015-10-06x86: implement fild, fucomi, and fucomip x87 instsSteve Reinhardt
fild loads an integer value into the x87 top of stack register. fucomi/fucomip compare two x87 register values (the latter also doing a stack pop). These instructions are used by some versions of GNU libstdc++.
2015-09-02sim: Add ability to break at specific kernel functionDylan Johnson
Adds a GDB callable function that sets a breakpoint at the beginning of a kernel function.
2015-09-30base: remove Trace::enabled flagCurtis Dunham
The DTRACE() macro tests both Trace::enabled and the specific flag. This change uses the same administrative interface for enabling/disabling tracing, but masks the SimpleFlags settings directly. This eliminates a load for every DTRACE() test, e.g. DPRINTF.
2015-09-30arm: Change TLB Software CachingMitch Hayenga
In ARM, certain variables are only updated when a necessary change is detected. Having 2 SMT threads share a TLB resulted in these not being updated as required. This patch adds a thread context identifer to assist in the invalidation of these variables.
2015-09-30cpu,isa,mem: Add per-thread wakeup logicMitch Hayenga
Changes wakeup functionality so that only specific threads on SMT capable cpus are woken.
2015-09-30isa,cpu: Add support for FS SMT InterruptsMitch Hayenga
Adds per-thread interrupt controllers and thread/context logic so that interrupts properly get routed in SMT systems.
2015-09-30arm: SMT MPIDR SettingMitch Hayenga
Changes assignment of the MPIDR for multi-threaded systems only.
2015-09-30cpu: Add per-thread monitorsMitch Hayenga
Adds per-thread address monitors to support FullSystem SMT.
2015-09-30config,cpu: Add SMT support to Atomic and Timing CPUsMitch Hayenga
Adds SMT support to the "simple" CPU models so that they can be used with other SMT-supported CPUs. Example usage: this enables the TimingSimpleCPU to be used to warmup caches before swapping to detailed mode with the in-order or out-of-order based CPU models.
2015-09-30cpu: Change thread assignments for heterogenous SMTMitch Hayenga
Trying to run an SE system with varying threads per core (SMT cores + Non-SMT cores) caused failures due to the CPU id assignment logic. The comment about thread assignment (worrying about core 0 not having tid 0) seems not to be valid given that our configuration scripts initialize them in order. This removes that constraint so a heterogenously threaded sytem can work.
2015-09-29ruby: Fix CacheMemory allocate leakJoel Hestness
If a cache entry permission was previously set to NotPresent, but the entry was not deleted, a following cache allocation can cause the entry to be leaked by setting the entry pointer to a newly allocated entry. To eliminate this possibility, check if the new entry is different from the old one, and if so, delete the old one.
2015-09-29arch, x86: Delete packet in IntDevice::recvResponseJoel Hestness
IntDevice::recvResponse is called from two places in current mainline: (1) the short circuit path of X86ISA::IntDevice::IntMasterPort::sendMessage for atomic mode, and (2) the full request->response path to and from the x86 interrupts device (finally called from MessageMasterPort::recvTimingResp). In the former case, the packet was deleted correctly, but in the latter case, the packet and request leak. To fix the leak, move request and packet deletion into IntDevice inherited class implementations of recvResponse.
2015-09-29ruby: RubyPort delete snoop requestsJoel Hestness
In RubyPort::ruby_eviction_callback, prior changes fixed a memory leak caused by instantiating separate packets for each port that the eviction was forwarded to. That change, however, left the instantiated request to also leak. Allocate it on the stack to avoid the leak.
2015-09-29ruby: Fix memory leak in AbstractControllerJoel Hestness
Recent changes to memory access queuing allocate requests for packets sent to memory controllers, but did not free the requests. Delete them to avoid leaks.
2015-09-29ruby: RubyMemoryControl delete requestsJoel Hestness
Changes to the RubyMemoryControl removed the dequeue function, which deleted MemoryNode instances. This results in leaked MemoryNode instances. Correctly delete these instances.
2015-09-29syscall_emul: Bandage readlink /proc/self/exeJoel Hestness
The recent changeset to readlink() to handle reading the /proc/self/exe link introduces a number of problems. This patch fixes two: 1) Because readlink() called on /proc/self/exe now uses LiveProcess::progName() to find the binary path, it will only get the zeroth parameter of the simulated system command line. However, if a config script also specifies the process' executable, the executable parameter is used to create the LiveProcess rather than the zeroth command line parameter. Thus, the zeroth command line parameter is not necessarily the correct path to the binary executing in the simulated system. To fix this, add a LiveProcess data member, 'executable', which is correctly set during instantiation and returned from progName(). 2) If a config script allows a user to pass a relative path as the zeroth simulated system command line parameter or process executable, readlink() will incorrecly return a relative path when called on '/proc/self/exe'. /proc/self/exe is always set to a full path, so running benchmarks can fail if a relative path is returned. To fix this, clean up the handling of LiveProcess::progName() within readlink() to get the full binary path. NOTE: This patch still leaves the potential problem that host full path to the binary bleeds into the simulated system, potentially causing the appearance of non-deterministic simulated system execution.