summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2015-03-19mem: Enable CommMonitor to output traces in atomic modeGeoffrey Blake
The CommMonitor by default only allows memory traces to be gathered in timing mode. This patch allows memory traces to be gathered in atomic mode if all one needs is a functional trace of memory addresses used and timing information is of a secondary concern.
2015-02-11mem: remove redundant test in in Cache::recvTimingResp()Steve Reinhardt
For some reason we were checking mshr->hasTargets() even though we had already called mshr->getTarget() unconditionally earlier in the same function (which asserts if there are no targets). Get rid of this useless check, and while we're at it get rid of the redundant call to mshr->getTarget(), since we still have the value saved in a local var.
2015-02-11mem: add local var in Cache::recvTimingResp()Steve Reinhardt
The main loop in recvTimingResp() uses target->pkt all over the place. Create a local tgt_pkt to help keep lines under the line length limit.
2015-02-11mem: restructure Packet cmd initialization a bit moreSteve Reinhardt
Refactor the way that specific MemCmd values are generated for packets. The new approach is a little more elegant in that we assign the right value up front, and it's also more amenable to non-heap-allocated Packet objects. Also replaced the code in the Minor model that was still doing it the ad-hoc way. This is basically a refinement of http://repo.gem5.org/gem5/rev/711eb0e64249.
2015-03-14mem: clean up write buffer check in Cache::handleSnoop()Steve Reinhardt
The 'if (writebacks.size)' check was redundant, because writeBuffer.findMatches() would return false if the writebacks list was empty. Also renamed 'mshr' to 'wb_entry' in this context since we are pointing at a writebuffer entry and not an MSHR (even though it's the same C++ class).
2015-03-09cpu: o3: another assert instead of checkNilay Vaish
2015-03-09cpu: o3: Remove unused code in iew, add assert instead.Nilay Vaish
2015-03-09cpu: o3: commit: mark pipeline delay variable as constsNilay Vaish
2015-03-09cpu: o3: remove unused stat variables.Nilay Vaish
2015-03-09cpu: o3: combine if with same conditionNilay Vaish
2015-03-09cpu: o3: remove member variable squashCounterNilay Vaish
The variable is used in only one place and a whole new function setNextStatus() has been defined just to compute the value of the variable. Instead of calling the function, the value is now computed in the loop that preceded the function call.
2015-03-09cpu: o3: remove unused function annotateMemoryUnits()Nilay Vaish
2015-03-02mem: Unify all cache DPRINTF address formattingAndreas Hansson
This patch changes all the DPRINTF messages in the cache to use '%#llx' every time a packet address is printed. The inclusion of '#' ensures '0x' is prepended, and since the address type is a uint64_t %x really should be %llx.
2015-03-02mem: Fix cache MSHR conflict determinationAndreas Hansson
This patch fixes a rather subtle issue in the sending of MSHR requests in the cache, where the logic previously did not check for conflicts between the MSRH queue and the write queue when requests were not ready. The correct thing to do is to always check, since not having a ready MSHR does not guarantee that there is no conflict. The underlying problem seems to have slipped past due to the symmetric timings used for the write queue and MSHR queue. However, with the recent timing changes the bug caused regressions to fail.
2015-03-02mem: Add byte mask to Packet::checkFunctionalAndreas Hansson
This patch changes the valid-bytes start/end to a proper byte mask. With the changes in timing introduced in previous patches there are more packets waiting in queues, and there are regressions using the checker CPU failing due to non-contigous read data being found in the various cache queues. This patch also adds some more comments explaining what is going on, and adds the fourth and missing case to Packet::checkFunctional.
2015-03-02mem: Add option to force in-order insertion in PacketQueueStephan Diestelhorst
By default, the packet queue is ordered by the ticks of the to-be-sent packages. With the recent modifications of packages sinking their header time when their resposne leaves the caches, there could be cases of MSHR targets being allocated and ordered A, B, but their responses being sent out in the order B,A. This led to inconsistencies in bus traffic, in particular the snoop filter observing first a ReadExResp and later a ReadRespWithInv. Logically, these were ordered the other way around behind the MSHR, but due to the timing adjustments when inserting into the PacketQueue, they were sent out in the wrong order on the bus, confusing the snoop filter. This patch adds a flag (off by default) such that these special cases can request in-order insertion into the packet queue, which might offset timing slighty. This is expected to occur rarely and not affect timing results.
2015-03-02mem: Downstream components consumes new crossbar delaysMarco Balboni
This patch makes the caches and memory controllers consume the delay that is annotated to a packet by the crossbar. Previously many components simply threw these delays away. Note that the devices still do not pay for these delays.
2015-03-02mem: Move crossbar default latencies to subclassesAndreas Hansson
This patch introduces a few subclasses to the CoherentXBar and NoncoherentXBar to distinguish the different uses in the system. We use the crossbar in a wide range of places: interfacing cores to the L2, as a system interconnect, connecting I/O and peripherals, etc. Needless to say, these crossbars have very different performance, and the clock frequency alone is not enough to distinguish these scenarios. Instead of trying to capture every possible case, this patch introduces dedicated subclasses for the three primary use-cases: L2XBar, SystemXBar and IOXbar. More can be added if needed, and the defaults can be overridden.
2015-03-02mem: Add crossbar latenciesMarco Balboni
This patch introduces latencies in crossbar that were neglected before. In particular, it adds three parameters in crossbar model: front_end_latency, forward_latency, and response_latency. Along with these parameters, three corresponding members are added: frontEndLatency, forwardLatency, and responseLatency. The coherent crossbar has an additional snoop_response_latency. The latency of the request path through the xbar is set as --> frontEndLatency + forwardLatency In case the snoop filter is enabled, the request path latency is charged also by look-up latency of the snoop filter. --> frontEndLatency + SF(lookupLatency) + forwardLatency. The latency of the response path through the xbar is set instead as --> responseLatency. In case of snoop response, if the response is treated as a normal response the latency associated is again --> responseLatency; If instead it is forwarded as snoop response we add an additional variable + snoopResponseLatency and the latency associated is --> snoopResponseLatency; Furthermore, this patch lets the crossbar progress on the next clock edge after an unused retry, changing the time the crossbar considers itself busy after sending a retry that was not acted upon.
2015-03-02dev, arm: Clean up PL011 and rewrite interrupt handlingAndreas Sandberg
The ARM PL011 UART model didn't clear and raise interrupts correctly. This changeset rewrites the whole interrupt handling and makes it both simpler and fixes several cases where the correct interrupts weren't raised or cleared. Additionally, it cleans up many other aspects of the code.
2015-03-02arm: Share a port for the two table walker objectsAndreas Hansson
This patch changes how the MMU and table walkers are created such that a single port is used to connect the MMU and the TLBs to the memory system. Previously two ports were needed as there are two table walker objects (stage one and stage two), and they both had a port. Now the port itself is moved to the Stage2MMU, and each TableWalker is simply using the port from the parent. By using the same port we also remove the need for having an additional crossbar joining the two ports before the walker cache or the L2. This simplifies the creation of the CPU cache topology in BaseCPU.py considerably. Moreover, for naming and symmetry reasons, the TLB walker port is connected through the stage-one table walker thus making the naming identical to x86. Along the same line, we use the stage-one table walker to generate the master id that is used by all TLB-related requests.
2015-03-02arm: Remove unnecessary dependencies between AArch64 FP instructionsGiacomo Gabrielli
2015-03-02cpu: o3 register renaming request handling improvedRekai
Now, prior to the renaming, the instruction requests the exact amount of registers it will need, and the rename_map decides whether the instruction is allowed to proceed or not.
2015-03-02mem: Tidy up the cache debug messagesAndreas Hansson
Avoid redundant inclusion of the name in the DPRINTF string.
2015-03-02mem: Split port retry for all different packet classesAndreas Hansson
This patch fixes a long-standing isue with the port flow control. Before this patch the retry mechanism was shared between all different packet classes. As a result, a snoop response could get stuck behind a request waiting for a retry, even if the send/recv functions were split. This caused message-dependent deadlocks in stress-test scenarios. The patch splits the retry into one per packet (message) class. Thus, sendTimingReq has a corresponding recvReqRetry, sendTimingResp has recvRespRetry etc. Most of the changes to the code involve simply clarifying what type of request a specific object was accepting. The biggest change in functionality is in the cache downstream packet queue, facing the memory. This queue was shared by requests and snoop responses, and it is now split into two queues, each with their own flow control, but the same physical MasterPort. These changes fixes the previously seen deadlocks.
2015-03-02mem: Fix prefetchSquash + memInhibitAsserted bugAli Jafri
This patch resolves a bug with hardware prefetches. Before a hardware prefetch is sent towards the memory, the system generates a snoop request to check all caches above the prefetch generating cache for the presence of the prefetth target. If the prefetch target is found in the tags or the MSHRs of the upper caches, the cache sets the prefetchSquashed flag in the snoop packet. When the snoop packet returns with the prefetchSquashed flag set, the prefetch generating cache deallocates the MSHR reserved for the prefetch. If the prefetch target is found in the writeback buffer of the upper cache, the cache sets the memInhibit flag, which signals the prefetch generating cache to expect the data from the writeback. When the snoop packet returns with the memInhibitAsserted flag set, it marks the allocated MSHR as inService and waits for the data from the writeback. If the prefetch target is found in multiple upper level caches, specifically in the tags or MSHRs of one upper level cache and the writeback buffer of another, the snoop packet will return with both prefetchSquashed and memInhibitAsserted set, while the current code is not written to handle such an outcome. Current code checks for the prefetchSquashed flag first, if it finds the flag, it deallocates the reserved MSHR. This leads to assert failure when the data from the writeback appears at cache. In this fix, we simply switch the order of checks. We first check for memInhibitAsserted and then for prefetch squashed.
2015-03-02cpu: Add a PC-value to the traffic generator requestsStephan Diestelhorst
Have the traffic generator add its masterID as the PC address to the requests. That way, prefetchers (and other components) that use a PC for request classification will see per-tester streams of requests. This enables us to test strided prefetchers with the memchecker, too.
2015-03-02arm: Don't truncate 16-bit ASIDs to 8 bitsAndreas Sandberg
The ISA code sometimes stores 16-bit ASIDs as 8-bit unsigned integers and has a couple of inverted checks that mask out the high 8 bits of an ASID if 16-bit ASIDs have been /enabled/. This changeset fixes both of those issues.
2015-03-02arm: Correctly access the stack pointer in GDBAndreas Sandberg
We curently use INTREG_X31 instead of INTREG_SPX when accessing the stack pointer in GDB. gem5 normally uses INTREG_SPX to access the stack pointer, which gets mapped to the stack pointer corresponding (INTREG_SPn) to the current exception level. This changeset updates the GDB interface to use SPX instead of X31 (which is always zero) when transfering CPU state to gdb.
2015-03-02arm: Fix broken page table permissions checks in remote GDBAndreas Sandberg
The remote GDB interface currently doesn't check if translations are valid before reading memory. This causes a panic when GDB tries to access unmapped memory (e.g., when getting a stack trace). There are two reasons for this: 1) The function used to check for valid translations (virtvalid()) doesn't work and panics on invalid translations. 2) The method in the GDB interface used to test if a translation is valid (RemoteGDB::acc) always returns true regardless of the return from virtvalid(). This changeset fixes both of these issues.
2015-02-26Ruby: Update backing store option to propagate through to all RubyPortsJason Power
Previously, the user would have to manually set access_backing_store=True on all RubyPorts (Sequencers) in the config files. Now, instead there is one global option that each RubyPort checks on initialization. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-02-16cpu: TrafficGen sinks snoops without complainingAndreas Hansson
To be able to use the TrafficGen in a system with caches we need to allow it to sink incoming snoop requests. By default the master port panics, so silently ignore any snoops.
2015-02-16mem: Fix initial value problem with MemCheckerStephan Diestelhorst
In highly loaded cases, reads might actually overlap with writes to the initial memory state. The mem checker needs to detect such cases and permit the read reading either from the writes (what it is doing now) or read from the initial, unknown value. This patch adds this logic.
2015-02-16dev: Fix undefined behaviuor in i8254xGBeAndreas Hansson
This patch fixes a rather unfortunate oversight where the annotation pointer was used even though it is null. Somehow the code still works, but UBSan is rather unhappy. The use is now guarded, and the variable is initialised in the constructor (as well as init()).
2015-02-16arm: Wire up the GIC with the platform in the base classAndreas Sandberg
Move the (common) GIC initialization code that notifies the platform code of the new GIC to the base class (BaseGic) instead of the Pl390 implementation.
2015-02-16mem: mmap the backing store with MAP_NORESERVEAndreas Hansson
This patch ensures we can run simulations with very large simulated memories (at least 64 TB based on some quick runs on a Linux workstation). In essence this allows us to efficiently deal with sparse address maps without having to implement a redirection layer in the backing store. This opens up for run-time errors if we eventually exhausts the hosts memory and swap space, but this should hopefully never happen.
2015-02-16mem: Use the range cache for lookup as well as accessAndreas Hansson
This patch changes the range cache used in the global physical memory to be an iterator so that we can use it not only as part of isMemAddr, but also access and functionalAccess. This matches use-cases where a core is using the atomic non-caching memory mode, and repeatedly calls isMemAddr and access. Linux boot on aarch32, with a single atomic CPU, is now more than 30% faster when using "--fastmem" compared to not using the direct memory access.
2015-02-16arch: Make readMiscRegNoEffect const throughoutAndreas Hansson
Finally took the plunge and made this apply to all ISAs, not just ARM.
2015-02-16arm: Merge ISA files with pseudo instructionsAndreas Sandberg
This changeset moves the pseudo instructions used to signal unknown instructions and unimplemented instructions to the same source files as the decoder fault.
2015-02-16cpu: add support for outputing a protobuf formatted CPU traceAli Saidi
Doesn't support x86 due to static instruction representation. --HG-- rename : src/cpu/CPUTracers.py => src/cpu/InstPBTrace.py
2015-02-11mem: Clarification of packet crossbar timingsMarco Balboni
This patch clarifies the packet timings annotated when going through a crossbar. The old 'firstWordDelay' is replaced by 'headerDelay' that represents the delay associated to the delivery of the header of the packet. The old 'lastWordDelay' is replaced by 'payloadDelay' that represents the delay needed to processing the payload of the packet. For now the uses and values remain identical. However, going forward the payloadDelay will be additive, and not include the headerDelay. Follow-on patches will make the headerDelay capture the pipeline latency incurred in the crossbar, whereas the payloadDelay will capture the additional serialisation delay.
2015-02-11mem: Clarify usage of latency in the cacheMarco Balboni
This patch adds some much-needed clarity in the specification of the cache timing. For now, hit_latency and response_latency are kept as top-level parameters, but the cache itself has a number of local variables to better map the individual timing variables to different behaviours (and sub-components). The introduced variables are: - lookupLatency: latency of tag lookup, occuring on any access - forwardLatency: latency that occurs in case of outbound miss - fillLatency: latency to fill a cache block We keep the existing responseLatency The forwardLatency is used by allocateInternalBuffer() for: - MSHR allocateWriteBuffer (unchached write forwarded to WriteBuffer); - MSHR allocateMissBuffer (cacheable miss in MSHR queue); - MSHR allocateUncachedReadBuffer (unchached read allocated in MSHR queue) It is our assumption that the time for the above three buffers is the same. Similarly, for snoop responses passing through the cache we use forwardLatency.
2015-02-11cpu: Tidy up the MemTest and make false sharing more obviousAndreas Hansson
The MemTest class really only tests false sharing, and as such there was a lot of old cruft that could be removed. This patch cleans up the tester, and also makes it more clear what the assumptions are. As part of this simplification the reference functional memory is also removed. The regression configs using MemTest are updated to reflect the changes, and the stats will be bumped in a separate patch. The example config will be updated in a separate patch due to more extensive re-work. In a follow-on patch a new tester will be introduced that uses the MemChecker to implement true sharing.
2015-02-11sim: Move the BaseTLB to src/arch/generic/Andreas Sandberg
The TLB-related code is generally architecture dependent and should live in the arch directory to signify that. --HG-- rename : src/sim/BaseTLB.py => src/arch/generic/BaseTLB.py rename : src/sim/tlb.cc => src/arch/generic/tlb.cc rename : src/sim/tlb.hh => src/arch/generic/tlb.hh
2015-02-11base: Add compiler macros to add deprecation warningsAndreas Sandberg
Gcc and clang both provide an attribute that can be used to flag a function as deprecated at compile time. This changeset adds a gem5 compiler macro for that compiler feature. The macro can be used to indicate that a legacy API within gem5 has been deprecated and provide a graceful migration to the new API.
2015-02-11base: Do not dereference NULL in CompoundFlag creationAndreas Hansson
This patch fixes the CompoundFlag constructor, ensuring that it does not dereference NULL. Doing so has undefined behaviuor, and both clang and gcc's undefined-behaviour sanitiser was rather unhappy.
2015-02-11dev: Remove unused system pointer in the Platform base classAndreas Sandberg
The Platform base class contains a pointer to an instance of the System which is never initialized. This can lead to subtle bugs since some architecture-specific platform implementations contain their own system pointer which is normally used. However, if the platform is accessed through a pointer to its base class, the dangling pointer will be used instead.
2015-02-06cpu: Idle CPU status logic revisedAlexandru Dutu
This patch sets the CPU status to idle when the last active thread gets suspended.
2015-02-03mem: Clarify express snoop behaviourAndreas Hansson
This patch adds a bit of documentation with insights around how express snoops really work.
2015-02-03mem: Clarify cache behaviour for pending dirty responsesAndreas Hansson
This patch adds a bit of clarification around the assumptions made in the cache when packets are sent out, and dirty responses are pending. As part of the change, the marking of an MSHR as in service is simplified slightly, and comments are added to explain what assumptions are made.