summaryrefslogtreecommitdiff
path: root/src/cpu/o3/lsq_unit_impl.hh
AgeCommit message (Collapse)Author
2019-05-12still cannot run fence+ift...hitsbIru Cai
2019-05-12finally runs dhrystoneIru Cai
Change-Id: I7466a825f8726682622d237460311a1c4b23b8ad
2019-05-12only spec load when hitIru Cai
2019-05-11try not expose if L1 hitis-ift-cachehitIru Cai
2019-04-22fix the violation checking for IFT+fenceIru Cai
When using IFT, fenceDelay means the load is deferred, but some load can be between fences and executed, so fenceDelayed instructions should be checked.
2019-04-17add a trackBranch optionIru Cai
2019-04-17IFT for fence schemeIru Cai
2019-04-16track instruction after tainted branchesIru Cai
2019-04-15Add IFT debug flagsIru Cai
2019-04-12keep time to expose as original scheme when inst->needPostFetch()Iru Cai
because when inst->needPostFetch(), it means that the spec load is already issued this also fixes the mysterious bug caused by the IFT code
2019-04-12add IFT optionsIru Cai
2019-04-08we need to ++loadsToVLD when (!inst->readyToExpose() && inst->needPostFetch())Iru Cai
2019-04-08implement taint propagationIru Cai
2019-04-03check loads using tainted registers, set USL dst as taintedIru Cai
2019-04-01fix getvaddr nullptr stuff, add a non-spec load printingis-rebase11-LSQUnitIru Cai
2019-03-21Request::getVaddr()Iru Cai
Change-Id: Ife5c04941a9181da30e5cc692dec7cfd53feb71f
2019-03-20invisispec-1.0 sourceIru Cai
2018-12-11cpu-o3: Fix bug in LSQUnit(uint32_t, uint32_t) ctorTony Gutierrez
Change 9af1214 added a new ctor to the LSQUnit, however there is a typo/bug because it sizes the SQEntries member variable to lqEntries + 1, as opposed to sqEntries + 1. This change corrects the issue by using sqEntries. Change-Id: I19dfaa5c0e335bd7b84343a92034147d7c5d914e Reviewed-on: https://gem5-review.googlesource.com/c/15015 Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
2018-12-03cpu: Change raw pointers to STL ContainersRekai Gonzalez-Alberquilla
This patch changes two members from being raw pointers to being STL containers. The reason behind, other than cleanlyness and arguable OO best practices is that containers have more intronspections capabilities than naked pointers do, as the size is known. Using STL containers adds little overhead and eases the automation of process during debugging (gdb). Change-Id: I4d9d3eedafa8b5e50ac512ea93b458a4200229f2 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13126 Reviewed-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
2018-11-16cpu: Fix the usage of const DynInstPtrRekai Gonzalez-Alberquilla
Summary: Usage of const DynInstPtr& when possible and introduction of move operators to RefCountingPtr. In many places, scoped references to dynamic instructions do a copy of the DynInstPtr when a reference would do. This is detrimental to performance. On top of that, in case there is a need for reference tracking for debugging, the redundant copies make the process much more painful than it already is. Also, from the theoretical point of view, a function/method that defines a convenience name to access an instruction should not be considered an owner of the data, i.e., doing a copy and not a reference is not justified. On a related topic, C++11 introduces move semantics, and those are useful when, for example, there is a class modelling a HW structure that contains a list, and has a getHeadOfList function, to prevent doing a copy to an internal variable -> update pointer, remove from the list -> update pointer, return value making a copy to the assined variable -> update pointer, destroy the returned value -> update pointer. Change-Id: I3bb46c20ef23b6873b469fd22befb251ac44d2f6 Signed-off-by: Giacomo Gabrielli <giacomo.gabrielli@arm.com> Reviewed-on: https://gem5-review.googlesource.com/c/13105 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Maintainer: Andreas Sandberg <andreas.sandberg@arm.com> Maintainer: Jason Lowe-Power <jason@lowepower.com>
2018-06-11misc: Using smart pointers for memory RequestsGiacomo Travaglini
This patch is changing the underlying type for RequestPtr from Request* to shared_ptr<Request>. Having memory requests being managed by smart pointers will simplify the code; it will also prevent memory leakage and dangling pointers. Change-Id: I7749af38a11ac8eb4d53d8df1252951e0890fde3 Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10996 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Maintainer: Nikos Nikoleris <nikos.nikoleris@arm.com>
2018-06-11misc: Substitute pointer to Request with aliased RequestPtrGiacomo Travaglini
Every usage of Request* in the code has been replaced with the RequestPtr alias. This is a preparing patch for when RequestPtr will be the typdefed to a smart pointer to Request rather then a raw pointer to Request. Change-Id: I73cbaf2d96ea9313a590cdc731a25662950cd51a Signed-off-by: Giacomo Travaglini <giacomo.travaglini@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-on: https://gem5-review.googlesource.com/10995 Reviewed-by: Anthony Gutierrez <anthony.gutierrez@amd.com> Reviewed-by: Daniel Carvalho <odanrc@yahoo.com.br> Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
2017-10-13cpu-o3: Check predication before the SQ size for a debug printNikos Nikoleris
The size of the store entry in the LSQ is used to indicate a fault in the execution of the store. At the same time, a store that is predicated false will also have 0 size in the corresponding store queue entry. This changeset ensures that we check if the store was predicated false before checking the size field. This way we avoid printing stores as faulting when they are only predicated false. Change-Id: Ie07982197bd73d7b44d26a3257d54ecb103a952a Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4821 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2017-10-13cpu-o3: Avoid early checker verification for store conditionalsNikos Nikoleris
The O3CPU allows stores to commit before they are completed and as soon as they enter the store queue. This is the reason why stores are verified by the the checker CPU, separately, once they complete and after they are sent to the memory. Store conditionals, on the other hand, have an additional writeback stage in the pipeline as they return their result to a register, similarly to loads. This is the reason why they do not commit before they receive a response from the memory. This allows store conditionals to be verified by the checker CPU as soon as they commit in the same way as all other non-store insturctions. At the same time, the presense of a checker CPU should not require changes to way we handle instructions. This change removes explicit calls to: * incorrectly set the extra data of the request to 0 (a subsequent call to completeAcc already does this without making any ISA assumptions about the return value of the failed store conditional) * complete failing store conditionals Change-Id: If21d70b21caa55b35e9fdcc50f254c590465d3c3 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-on: https://gem5-review.googlesource.com/4820 Maintainer: Andreas Sandberg <andreas.sandberg@arm.com>
2016-12-21cpu: Clarify meaning of cachePorts variable in lsq_unit.hh of O3Arthur Perais
cachePorts currently constrains the number of store packets written to the D-Cache each cycle), but loads currently affect this variable. This leads to unexpected congestion (e.g., setting cachePorts to a realistic 1 will in fact allow a store to WB only if no loads have accessed the D-Cache this cycle). In the absence of arbitration, this patch decouples how many loads can be done per cycle from how many stores can be done per cycle. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2015-08-10mem, cpu: Add assertions to snoop invalidation logicStephan Diestelhorst
This patch adds assertions that enforce that only invalidating snoops will ever reach into the logic that tracks in-order load completion and also invalidation of LL/SC (and MONITOR / MWAIT) monitors. Also adds some comments to MSHR::replaceUpgrades().
2015-07-19cpu: Fix LLSC atomic CPU wakeupKrishnendra Nathella
Writes to locked memory addresses (LLSC) did not wake up the locking CPU. This can lead to deadlocks on multi-core runs. In AtomicSimpleCPU, recvAtomicSnoop was checking if the incoming packet was an invalidation (isInvalidate) and only then handled a locked snoop. But, writes are seen instead of invalidates when running without caches (fast-forward configurations). As as simple fix, now handleLockedSnoop is also called even if the incoming snoop packet are from writes.
2015-12-04cpu: fix unitialized variable which may cause assertion failurePau Cabre
The assert in lsq_unit_impl.hh line 963 needs pktPending to be initialized to NULL (I got the assertion failure several times without the fix). Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-09-15cpu, o3: consider split requests for LSQ checksnoop operationsHongil Yoon
This patch enables instructions in LSQ to track two physical addresses for corresponding two split requests. Later, the information is used in checksnoop() to search for/invalidate the corresponding LD instructions. The current implementation has kept track of only the physical address that is referenced by the first split request. Thus, for checksnoop(), the line accessed by the second request has not been considered, causing potential correctness issues. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2015-05-05mem, cpu: Add a separate flag for strictly ordered memoryAndreas Sandberg
The Request::UNCACHEABLE flag currently has two different functions. The first, and obvious, function is to prevent the memory system from caching data in the request. The second function is to prevent reordering and speculation in CPU models. This changeset gives the order/speculation requirement a separate flag (Request::STRICT_ORDER). This flag prevents CPU models from doing the following optimizations: * Speculation: CPU models are not allowed to issue speculative loads. * Write combining: CPU models and caches are not allowed to merge writes to the same cache line. Note: The memory system may still reorder accesses unless the UNCACHEABLE flag is set. It is therefore expected that the STRICT_ORDER flag is combined with the UNCACHEABLE flag to prevent this behavior.
2014-12-02cpu, o3: Ignored invalidate causing same-address load reorderingMarco Elver
In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.
2014-12-02cpu: Move packet deallocation to recvTimingResp in the O3 CPUStephan Diestelhorst
Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.
2014-10-16arch: Use shared_ptr for all FaultsAndreas Hansson
This patch takes quite a large step in transitioning from the ad-hoc RefCountingPtr to the c++11 shared_ptr by adopting its use for all Faults. There are no changes in behaviour, and the code modifications are mostly just replacing "new" with "make_shared".
2014-09-20base: Clean up redundant string functions and use C++11Andreas Hansson
This patch does a bit of housekeeping on the string helper functions and relies on the C++11 standard library where possible. It also does away with our custom string hash as an implementation is already part of the standard library.
2014-05-13mem: Refactor assignment of Packet typesCurtis Dunham
Put the packet type swizzling (that is currently done in a lot of places) into a refineCommand() member function.
2014-09-03cpu: Fix cache blocked load behavior in o3 cpuMitch Hayenga
This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than flushing the entire pipeline, this patch replays loads once the cache becomes unblocked. Additionally, deferred memory instructions (loads which had conflicting stores), when replayed would not respect the number of functional units (only respected issue width). This patch also corrects that. Improvements over 20% have been observed on a microbenchmark designed to exercise this behavior.
2014-09-03cpu: Change writeback modeling for outstanding instructionsMitch Hayenga
As highlighed on the mailing list gem5's writeback modeling can impact performance. This patch removes the limitation on maximum outstanding issued instructions, however the number that can writeback in a single cycle is still respected in instToCommit().
2014-06-21o3: split load & store queue full cases in renameBinh Pham
Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa. This work was done while Binh was an intern at AMD Research.
2014-05-31style: eliminate equality tests with true and falseSteve Reinhardt
Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'. It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up. Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code.
2014-04-01cpu: Fix case where o3 lsq could print out uninitialized dataMitch Hayenga
In the O3 LSQ, data read/written is printed out in DPRINTFs. However, the data field is treated as a character string with a null terminated. However the data field is not encoded this way. This patch removes that possibility by removing the data part of the print.
2014-03-25cpu: o3: lsq: Fix TSO implementationMarco Elver
This patch fixes violation of TSO in the O3CPU, as all loads must be ordered with all other loads. In the LQ, if a snoop is observed, all subsequent loads need to be squashed if the system is TSO. Prior to this patch, the following case could be violated: P0 | P1 ; MOV [x],mail=/usr/spool/mail/nilay | MOV EAX,[y] ; MOV [y],mail=/usr/spool/mail/nilay | MOV EBX,[x] ; exists (1:EAX=1 /\ 1:EBX=0) [is a violation] The problem was found using litmus [http://diy.inria.fr]. Committed by: Nilay Vaish <nilay@cs.wisc.edu
2014-01-24cpu: Add support for instructions that zero cache lines.Ali Saidi
2014-01-24cpu: Add CPU support for generatig wake up events when LLSC adresses are ↵Ali Saidi
snooped. This patch add support for generating wake-up events in the CPU when an address that is currently in the exclusive state is hit by a snoop. This mechanism is required for ARMv8 multi-processor support.
2014-01-24base: add support for probe points and common probesMatt Horsnell
The probe patch is motivated by the desire to move analytical and trace code away from functional code. This is achieved by the probe interface which is essentially a glorified observer model. What this means to users: * add a probe point and a "notify" call at the source of an "event" * add an isolated module, that is being used to carry out *your* analysis (e.g. generate a trace) * register that module as a probe listener Note: an example is given for reference in src/cpu/o3/simple_trace.[hh|cc] and src/cpu/SimpleTrace.py What is happening under the hood: * every SimObject maintains has a ProbeManager. * during initialization (src/python/m5/simulate.py) first regProbePoints and the regProbeListeners is called on each SimObject. this hooks up the probe point notify calls with the listeners. FAQs: Why did you develop probe points: * to remove trace, stats gathering, analytical code out of the functional code. * the belief that probes could be generically useful. What is a probe point: * a probe point is used to notify upon a given event (e.g. cpu commits an instruction) What is a probe listener: * a class that handles whatever the user wishes to do when they are notified about an event. What can be passed on notify: * probe points are templates, and so the user can generate probes that pass any type of argument (by const reference) to a listener. What relationships can be generated (1:1, 1:N, N:M etc): * there isn't a restriction. You can hook probe points and listeners up in a 1:1, 1:N, N:M relationship. They become useful when a number of modules listen to the same probe points. The idea being that you can add a small number of probes into the source code and develop a larger number of useful analysis modules that use information passed by the probes. Can you give examples: * adding a probe point to the cpu's commit method allows you to build a trace module (outputting assembler), you could re-use this to gather instruction distribution (arithmetic, load/store, conditional, control flow) stats. Why is the probe interface currently restricted to passing a const reference: * the desire, initially at least, is to allow an interface to observe functionality, but not to change functionality. * of course this can be subverted by const-casting. What is the performance impact of adding probes: * when nothing is actively listening to the probes they should have a relatively minor impact. Profiling has suggested even with a large number of probes (60) the impact of them (when not active) is very minimal (<1%).
2014-01-24mem: track per-request latencies and access depths in the cache hierarchyMatt Horsnell
Add some values and methods to the request object to track the translation and access latency for a request and which level of the cache hierarchy responded to the request.
2013-10-17cpu: add consistent guarding to *_impl.hh files.Matt Horsnell
2013-10-17cpu: Put in assertions to check for maximum supported LQ/SQ sizeFaissal Sleiman
LSQSenderState represents the LQ/SQ index using uint8_t, which supports up to 256 entries (including the sentinel entry). Sending packets to memory with a higher index than 255 truncates the index, such that the response matches the wrong entry. For instance, this can result in a deadlock if a store completion does not clear the head entry.
2013-07-18mem: Set the cache line size on a system levelAndreas Hansson
This patch removes the notion of a peer block size and instead sets the cache line size on the system level. Previously the size was set per cache, and communicated through the interconnect. There were plenty checks to ensure that everyone had the same size specified, and these checks are now removed. Another benefit that is not yet harnessed is that the cache line size is now known at construction time, rather than after the port binding. Hence, the block size can be locally stored and does not have to be queried every time it is used. A follow-on patch updates the configuration scripts accordingly.
2013-02-15o3: fix tick used for renaming and issue with range selectionMatt Horsnell
Fixes the tick used from rename: - previously this gathered the tick on leaving rename which was always 1 less than the dispatch. This conflated the decode ticks when back pressure built in the pipeline. - now picks up tick on entry. Added --store_completions flag: - will additionally display the store completion tail in the viewer. - this highlights periods when large numbers of stores are outstanding (>16 LSQ blocking) Allows selection by tick range (previously this caused an infinite loop)
2013-01-07cpu: Rewrite O3 draining to avoid stopping in microcodeAndreas Sandberg
Previously, the O3 CPU could stop in the middle of a microcode sequence. This patch makes sure that the pipeline stops when it has committed a normal instruction or exited from a microcode sequence. Additionally, it makes sure that the pipeline has no instructions in flight when it is drained, which should make draining more robust. Draining is controlled in the commit stage, which checks if the next PC after a committed instruction is in microcode. If this isn't the case, it requests a squash of all instructions after that the instruction that just committed and immediately signals a drain stall to the fetch stage. The CPU then continues to execute until the pipeline and all associated buffers are empty.