cpu, o3: Ignored invalidate causing same-address load reordering

In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.
author: Marco Elver <Marco.Elver@ARM.com> 2014-12-02 06:08:03 -0500
committer: Marco Elver <Marco.Elver@ARM.com> 2014-12-02 06:08:03 -0500
commit: 9649395f853616b337992ca01d3474c214a7f718 (patch)
tree: 862ab03e828409e6973e7c732575094360c25847 /src/cpu
parent: 74bbe20141aec3bd7639000fb6ad8ff5fe1a7237 (diff)
download: gem5-9649395f853616b337992ca01d3474c214a7f718.tar.xz
2 files changed, 37 insertions, 3 deletions
diff --git a/src/cpu/o3/lsq_impl.hh b/src/cpu/o3/lsq_impl.hh
index 2c9c6eb82..e0107e36a 100644
--- a/src/cpu/o3/lsq_impl.hh
+++ b/src/cpu/o3/lsq_impl.hh
@@ -346,7 +346,29 @@ LSQ<Impl>::recvTimingResp(PacketPtr pkt)
     if (pkt->isError())
         DPRINTF(LSQ, "Got error packet back for address: %#X\n",
                 pkt->getAddr());
+
     thread[pkt->req->threadId()].completeDataAccess(pkt);
+
+    if (pkt->isInvalidate()) {
+        // This response also contains an invalidate; e.g. this can be the case
+        // if cmd is ReadRespWithInvalidate.
+        //
+        // The calling order between completeDataAccess and checkSnoop matters.
+        // By calling checkSnoop after completeDataAccess, we ensure that the
+        // fault set by checkSnoop is not lost. Calling writeback (more
+        // specifically inst->completeAcc) in completeDataAccess overwrites
+        // fault, and in case this instruction requires squashing (as
+        // determined by checkSnoop), the ReExec fault set by checkSnoop would
+        // be lost otherwise.
+
+        DPRINTF(LSQ, "received invalidation with response for addr:%#x\n",
+                pkt->getAddr());
+
+        for (ThreadID tid = 0; tid < numThreads; tid++) {
+            thread[tid].checkSnoop(pkt);
+        }
+    }
+
     delete pkt->req;
     delete pkt;
     return true;
diff --git a/src/cpu/o3/lsq_unit_impl.hh b/src/cpu/o3/lsq_unit_impl.hh
index 0cc412811..9c500443e 100644
--- a/src/cpu/o3/lsq_unit_impl.hh
+++ b/src/cpu/o3/lsq_unit_impl.hh
@@ -562,7 +562,7 @@ LSQUnit<Impl>::checkViolations(int load_idx, DynInstPtr &inst)
                 // Otherwise, mark the load has a possible load violation
                 // and if we see a snoop before it's commited, we need to squash
                 ld_inst->possibleLoadViolation(true);
-                DPRINTF(LSQUnit, "Found possible load violaiton at addr: %#x"
+                DPRINTF(LSQUnit, "Found possible load violation at addr: %#x"
                         " between instructions [sn:%lli] and [sn:%lli]\n",
                         inst_eff_addr1, inst->seqNum, ld_inst->seqNum);
             } else {
@@ -1124,8 +1124,20 @@ LSQUnit<Impl>::writeback(DynInstPtr &inst, PacketPtr pkt)
     if (!inst->isExecuted()) {
         inst->setExecuted();
 
-        // Complete access to copy data to proper place.
-        inst->completeAcc(pkt);
+        if (inst->fault == NoFault) {
+            // Complete access to copy data to proper place.
+            inst->completeAcc(pkt);
+        } else {
+            // If the instruction has an outstanding fault, we cannot complete
+            // the access as this discards the current fault.
+
+            // If we have an outstanding fault, the fault should only be of
+            // type ReExec.
+            assert(dynamic_cast<ReExec*>(inst->fault.get()) != nullptr);
+
+            DPRINTF(LSQUnit, "Not completing instruction [sn:%lli] access "
+                    "due to pending fault.\n", inst->seqNum);
+        }
     }
 
     // Need to insert instruction into queue to commit
author	Marco Elver <Marco.Elver@ARM.com>	2014-12-02 06:08:03 -0500
committer	Marco Elver <Marco.Elver@ARM.com>	2014-12-02 06:08:03 -0500
commit	9649395f853616b337992ca01d3474c214a7f718 (patch)
tree	862ab03e828409e6973e7c732575094360c25847 /src/cpu
parent	74bbe20141aec3bd7639000fb6ad8ff5fe1a7237 (diff)
download	gem5-9649395f853616b337992ca01d3474c214a7f718.tar.xz