summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-12-02cpu, o3: Ignored invalidate causing same-address load reorderingMarco Elver
In case the memory subsystem sends a combined response with invalidate (e.g. ReadRespWithInvalidate), we cannot ignore the invalidate part of the response. If we were to ignore the invalidate part, under certain circumstances this effectively leads to reordering of loads to the same address which is not permitted under any memory consistency model implemented in gem5. Consider the case where a later load's address is computed before an earlier load in program order, and is therefore sent to the memory subsystem first. At some point the earlier load's address is computed and in doing so correctly marks the later load as a possibleLoadViolation. In the meantime some other node writes and sends invalidations to all other nodes. The invalidation races with the later load's ReadResp, and arrives before ReadResp and is deferred. Upon receipt of the ReadResp, the response is changed to ReadRespWithInvalidate, and sent to the CPU. If we ignore the invalidate part of the packet, we let the later load read the old value of the address. Eventually the earlier load's ReadResp arrives, but with new data. As there was no invalidate snoop (sunk into the ReadRespWithInvalidate), and if we did not process the invalidate of the ReadRespWithInvalidate, we obtain a load reordering. A similar scenario can be constructed where the earlier load's address is computed after ReadRespWithInvalidate arrives for the younger load. In this case hitExternalSnoop needs to be set to true on the ReadRespWithInvalidate, so that upon knowing the address of the earlier load, checkViolations will cause the later load to be squashed. Finally we must account for the case where both loads are sent to the memory subsystem (reordered), a snoop invalidate arrives and correctly sets the later loads fault to ReExec. However, before the CPU processes the fault, the later load's ReadResp arrives and the writeback discards the outstanding fault. We must add a check to ensure that we do not skip any unprocessed faults.
2014-12-02cpu: Always mask the snoop address when performing lock checkAndreas Hansson
Ensure the snoop address check is always using a cache-block aligned address. This patch updates Alpha and Mips to match the other ISAs.
2014-12-02cpu: Move packet deallocation to recvTimingResp in the O3 CPUStephan Diestelhorst
Move the packet deallocations in the O3 CPU so that the completeDataAccess deals only with the LSQ specific parts and the generic recvTimingResp frees the packet in all other cases.
2014-12-02mem: Relax packet src/dest check and shift onus to crossbarAndreas Hansson
This patch allows objects to get the src/dest of a packet even if it is not set to a valid port id. This simplifies (ab)using the bridge as a buffer and latency adapter in situations where the neighbouring MemObjects are not crossbars. The checks that were done in the packet are now shifted to the crossbar where the fields are used to index into the port arrays. Thus, the carrier of the information is not burdened with checking, and the crossbar can check not only that the destination is set, but also that the port index is within limits.
2014-12-02mem: Clean up packet data allocationAndreas Hansson
This patch attempts to make the rules for data allocation in the packet explicit, understandable, and easy to verify. The constructor that copies a packet is extended with an additional flag "alloc_data" to enable the call site to explicitly say whether the newly created packet is short-lived (a zero-time snoop), or has an unknown life-time and therefore should allocate its own data (or copy a static pointer in the case of static data). The tricky case is the static data. In essence this is a copy-avoidance scheme where the original source of the request (DMA, CPU etc) does not ask the memory system to return data as part of the packet, but instead provides a pointer, and then the memory system carries this pointer around, and copies the appropriate data to the location itself. Thus any derived packet actually never copies any data. As the original source does not copy any data from the response packet when arriving back at the source, we must maintain the copy of the original pointer to not break the system. We might want to revisit this one day and pay the price for a few extra memcpy invocations. All in all this patch should make it easier to grok what is going on in the memory system and how data is actually copied (or not).
2014-12-02mem: Cleanup Packet::checkFunctional and hasData usageAndreas Hansson
This patch cleans up the use of hasData and checkFunctional in the packet. The hasData function is unfortunately suggesting that it checks if the packet has a valid data pointer, when it does in fact only check if the specific packet type is specified to have a data payload. The confusion led to a bug in checkFunctional. The latter function is also tidied up to avoid name overloading.
2014-12-02mem: Make the requests carried by packets constAndreas Hansson
This adds a basic level of sanity checking to the packet by ensuring that a request is not modified once the packet is created. The only issue that had to be worked around is the relaying of software-prefetches in the cache. The specific situation is now solved by first copying the request, and then creating a new packet accordingly.
2014-12-02mem: Make Request getters constAndreas Hansson
This patch tidies up the Request class, making all getters const. The odd one out is incAccessDepth which is called by the memory system as packets carry the request around. This is also const to enable the packet to hold on to a const Request.
2014-12-02mem: Add checks and explanation for assertMemInhibit usageAndreas Hansson
2014-12-02mem: Assume all dynamic packet data is array allocatedAndreas Hansson
This patch simplifies how we deal with dynamically allocated data in the packet, always assuming that it is array allocated, and hence should be array deallocated (delete[] as opposed to delete). The only uses of dataDynamic was in the Ruby testers. The ARRAY_DATA flag in the packet is removed accordingly. No defragmentation of the flags is done at this point, leaving a gap in the bit masks. As the last part the patch, it renames dataDynamicArray to dataDynamic.
2014-12-02mem: Remove redundant Packet::allocate callsAndreas Hansson
This patch cleans up the packet memory allocation confusion. The data is always allocated at the requesting side, when a packet is created (or copied), and there is never a need for any device to allocate any space if it is merely responding to a paket. This behaviour is in line with how SystemC and TLM works as well, thus increasing interoperability, and matching established conventions. The redundant calls to Packet::allocate are removed, and the checks in the function are tightened up to make sure data is only ever allocated once. There are still some oddities in the packet copy constructor where we copy the data pointer if it is static (without ownership), and allocate new space if the data is dynamic (with ownership). The latter is being worked on further in a follow-on patch.
2014-12-02mem: Use const pointers for port proxy write functionsAndreas Hansson
This patch changes the various write functions in the port proxies to use const pointers for all sources (similar to how memcpy works). The one unfortunate aspect is the need for a const_cast in the packet, to avoid having to juggle a const and a non-const data pointer. This design decision can always be re-evaluated at a later stage.
2014-12-02mem: Add const getters for write packet dataAndreas Hansson
This patch takes a first step in tightening up how we use the data pointer in write packets. A const getter is added for the pointer itself (getConstPtr), and a number of member functions are also made const accordingly. In a range of places throughout the memory system the new member is used. The patch also removes the unused isReadWrite function.
2014-12-02mem: Remove null-check bypassing in Packet::getPtrAndreas Hansson
This patch removes the parameter that enables bypassing the null check in the Packet::getPtr method. A number of call sites assume the value to be non-null. The one odd case is the RubyTester, which issues zero-sized prefetches(!), and despite being reads they had no valid data pointer. This is now fixed, but the size oddity remains (unless anyone object or has any good suggestions). Finally, in the Ruby Sequencer, appropriate checks are made for flush packets as they have no valid data pointer.
2014-12-02mem: Add a GDDR5 DRAM configOmar Naji
This patch adds a first cut GDDR5 config to accommodate the users combining gem5 and GPUSim. The config is based on a SK Hynix datasheet, and the Nvidia GTX580 specification. Someone from the GPUSim user-camp should tweak the default page-policy and static frontend and backend latencies.
2014-11-24stats: Bump stats after static analysis fixesAndreas Hansson
Fixing up the uninitialised values changes two of the x86 Linux boot regressions slightly.
2014-11-24misc: Another round of static analysis fixupsAndreas Hansson
Mostly addressing uninitialised members.
2014-11-23mem: Page Table map api modificationAlexandru Dutu
This patch adds uncacheable/cacheable and read-only/read-write attributes to the map method of PageTableBase. It also modifies the constructor of TlbEntry structs for all architectures to consider the new attributes.
2014-11-23mem: Multi Level Page Table bug fixAlexandru Dutu
The multi level page table was giving false positives for already mapped translations. This patch fixes the bogus behavior.
2014-11-23mem: Page Table long linesAlexandru Dutu
Trimmed down all the lines greater than 78 characters.
2014-11-23config, kvm: Enabling KvmCPU in SE modeAlexandru Dutu
This patch modifies se.py such that it can now use kvm cpu model.
2014-11-23x86: Segment initialization to support KvmCPU in SEAlexandru Dutu
This patch sets up low and high privilege code and data segments and places them in the following order: cs low, ds low, ds, cs, in the GDT. Additionally, a syscall and page fault handler for KvmCPU in SE mode are defined. The order of the segment selectors in GDT is required in this manner for interrupt handling to work properly. Segment initialization is done for all the thread contexts.
2014-11-23kvm, x86: Adding support for SE mode executionAlexandru Dutu
This patch adds methods in KvmCPU model to handle KVM exits caused by syscall instructions and page faults. These types of exits will be encountered if KvmCPU is run in SE mode.
2014-11-23cpuid, x86: Enabling more features in CPUidAlexandru Dutu
Adding more features in the CPUid with the purpose of supporting running the KvmCPU in SE mode.
2014-11-23Backed out prior changeset f9fb64a72259Steve Reinhardt
Back out use of importlib to avoid implicitly creating dependency on Python 2.7.
2014-11-23config: ruby: Get rid of an "eval" and an "exec" operating on generated code.Gabe Black
We can get the same result using importlib.
2014-11-21x86: Update stats for the new Linux delay port.Gabe Black
2014-11-21x86: pc: Put a stub IO device at port 0xed which the kernel can use for delays.Gabe Black
There was already a stub device at 0x80, the port traditionally used for an IO delay. 0x80 is also the port used for POST codes sent by firmware, and that may have prompted adding this port as a second option.
2014-11-18configs: small fix to ruby portion of fs.py and se.pyNilay Vaish
In fs.py the io port controller was being attached to the iobus multiple times. This should be done only once. In se.py, the the option use_map was being set which no longer exists.
2014-11-18dev: Use fixed size member variables to describe fixed size PL111 registers.Gabe Black
2014-11-17vnc: Add a conversion function for bgr888.Gabe Black
2014-11-17x86: Fix setting segment bases in real mode.Gabe Black
The data size used for actually writing the base value for the segment was the default size, but really it should set the entire value without any possible truncation.
2014-11-17x86: Fix some bugs in the real mode far jmp instruction.Gabe Black
The far pointer should be shifted right to get the selector value, not left. Also, when calculating the width of the offset, the wrong register was used in one spot.
2014-11-17x86: APIC: Only set deliveryStatus if our IPI is going somewhere.Gabe Black
Otherwise the IPI which isn't sent will never arrive, and the deliveryStatus bit will never be cleared.
2014-11-17x86: APIC: Fix the getRegArrayBit function.Gabe Black
The getRegArrayBit function extracts a bit from a series of registers which are treated as a single large bit array. A previous change had modified the logic which figured out which bit to extract from ">> 5" to "% 5" which seems wrong, especially when other, similar functions were changed to use "% 32".
2014-11-17x86: Update the stats for the x86 FS o3 boot test.Gabe Black
2014-11-16x86: Fix the CPUID Long Mode Address Size function.Gabe Black
The value in EAX has an 8 bit field for the linear address size and one for the physical address size when calling that function. A recent change implemented it but returned 0xff for both of those fields. That implies that linear and physical addresses are 255 bits wide which is wrong. When using the KVM CPU model this causes an error, presumably because some of those bits are actually reserved, or the CPU or kernel realizes 255 bits is a bad value. This change makes those values 48.
2014-11-14config: Fix checkpoint restore in C++ config exampleAndrew Bardsley
This patch fixes the checkpoint restore option in the example of C++ configuration (util/cxx_config). The fix introduces a call to config_manager->startup() (which calls startup on all SimObjects managed by that manager) to replicate the loop of SimObject::startup calls in src/python/m5/simulate.py::simulate guarded by need_startup. As util/cxx_config/main.cc is a C++ analogue of src/python/mt/simulate.py, it should make a similar set of calls.
2014-11-14arm: Fixes based on UBSan and static analysisAndreas Hansson
Another churn to clean up undefined behaviour, mostly ARM, but some parts also touching the generic part of the code base. Most of the fixes are simply ensuring that proper intialisation. One of the more subtle changes is the return type of the sign-extension, which is changed to uint64_t. This is to avoid shifting negative values (undefined behaviour) in the ISA code.
2014-11-14mem: Clarify unit of DRAM controller buffer sizeAndreas Hansson
2014-11-12stats: Bump regressions to match latest changesAndreas Hansson
Updates after timezone hick-up and sorting of dictionary items in the SimObject.
2014-11-12mem: Delete unused variable in Garnet NetworkLinkMitch Hayenga
With recent changes OSX clang compilation fails due to an unused variable.
2014-11-12arm: Fix timing wakeup with LLSCAli Saidi
2014-11-12sim: Sort SimObject descendants and portsAndreas Hansson
This patch fixes a number of occurences where the sorting order of the objects was implementation defined.
2014-11-12base: Revert 9277177eccff and use getenv/setenv for UTC timeAndreas Hansson
This patch reverts changeset 9277177eccff which does not do what it was intended to do. In essence, we go back to implementing mkutctime much like the non-standard timegm extension.
2014-11-11stats: changes to x86 o3 fs and sparc fs regression tests.Nilay Vaish
2014-11-06x86 isa: This patch attempts an implementation at mwait.Marc Orr
Mwait works as follows: 1. A cpu monitors an address of interest (monitor instruction) 2. A cpu calls mwait - this loads the cache line into that cpu's cache. 3. The cpu goes to sleep. 4. When another processor requests write permission for the line, it is evicted from the sleeping cpu's cache. This eviction is forwarded to the sleeping cpu, which then wakes up. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06tests: A test program for the new mwait implementation.Marc Orr
This is a simple test program for the new mwait implemenation. It is uses m5threads to create to threads of execution in syscall emulation mode that interact using the mwait instruction. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06cpu: Minor Draining BugAndrew Lukefahr
Fixes a bug where Minor drains in the midst of committing a conditional store. While committing a conditional store, lastCommitWasEndOfMacroop is true (from the previous instruction) as we still haven't finished the conditional store. If a drain occurs before the cache response, Minor would check just lastCommitWasEndOfMacroop, which was true, and set drainState=DrainHaltFetch, which increases the streamSeqNum. This caused the conditional store to be squashed when the memory responded and it completed. However, to the memory the store succeeded, while to the instruction sequence it never occurred. In the case of an LLSC, the instruction sequence will replay the squashed STREX, which will fail as the cache is no longer in LLSC. Then the instruction sequence will loop back to a LDREX, which receives the updated (incorrect) value. Committed by: Nilay Vaish <nilay@cs.wisc.edu>
2014-11-06stats: updates due to changes to rubyNilay Vaish