summaryrefslogtreecommitdiff
path: root/src/mem
AgeCommit message (Collapse)Author
2017-01-19ruby: guard usage of GPUCoalescer code in ProfilerTony Gutierrez
the GPUCoalescer code is used in the ruby profiler regardless of whether or not the coalescer code has been compiled, which can lead to link/run time errors. here we add #ifdefs to guard the usage of GPUCoalescer code. eventually we should refactor this code to use probe points.
2017-01-19ruby: Check MessageBuffer space in garnet NetworkInterfaceMatthew Poremba
Garnet's NetworkInterface does not consider the size of MessageBuffers when ejecting a Message from the network. Add a size check for the MessageBuffer and only enqueue if space is available. If space is not available, the message if placed in a queue and the credit is held. A callback from the MessageBuffer is implemented to wake the NetworkInterface. If there are messages in the stalled queue, they are processed first, in a FIFO manner and if succesfully ejected, the credit is finally sent back upstream. The maximum size of the stall queue is equal to the number of valid VNETs with MessageBuffers attached.
2017-01-19ruby: Add occupancy stats to MessageBuffersMatthew Poremba
This patch is an updated version of /r/3297. "The most important statistic for measuring memory hierarchy performance is throughput, which is affected by independent variables, buffer sizing and communication latency. It is difficult/impossible to debug performance issues through series buffers without knowing which are the bottlenecks. For finite buffers, this patch adds statistics for the average number of messages in the buffer, the occupancy of the buffer slots, and number of message stalls."
2017-01-19ruby: Check all VNETs for injection in garnet NetworkInterfaceMatthew Poremba
The NetworkInterface wakeup currently iterates over all VNETs and breaks the loop if a VNET is unable to allocate a VC. This can cause a deadlock if a lower numbered VNET is unable to allocate a VC while a higher numbered VNET has idle VCs. This seems like a bug as Garnet 1.0 uses a while loop over an if-statement, suggesting the break was intended for this while loop. This patch removes the break statement, which allows up to one message to be dequeued from a VNET and injected into the network.
2016-11-09style: [patch 1/22] use /r/3648/ to reorganize includesBrandon Potter
2016-12-20ruby: Make MessageBuffers actually finite sizedJoel Hestness
When Ruby controllers stall messages in MessageBuffers, the buffer moves those messages off the priority heap and into a per-address stall map. When buffers are finite-sized, the test areNSlotsAvailable() only checks the size of the priority heap, but ignores the stall map, so the map is allowed to grow unbounded if the controller stalls numerous messages. This patch fixes the problem by tracking the stall map size and testing the total number of messages in the buffer appropriately.
2016-12-20ruby: fix typo in DMASequencer::ackCallback()Tony Gutierrez
2016-12-20ruby: fix issue with unused var in DMASequencerTony Gutierrez
the iterator declared in DMASequencer::ackCallback() is only used in an assert, this causes clang to fail when building fast. here we move the find call on the request table directly into the assert.
2016-12-19mem: Make the BaseXBar public to not confuse Python wrappersAndreas Sandberg
The Python wrappers generally assume that destructors are public. Make the BaseXBar destructor public to avoid confusing the Python wrapper. Change-Id: If958802409c0be74e875dd6e279742abfdb3ede1 Signed-off-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Curtis Dunham <curtis.dunham@arm.com>
2016-12-15ruby: Detect garnet network-level deadlock.Jieming Yin
This patch detects garnet network deadlock by monitoring network interfaces. If a network interface continuously fails to allocate virtual channels for a message, a possible deadlock is detected.
2016-12-05ruby: Remove RubyMemoryControl and associated filesAndreas Hansson
This patch removes the deprecated RubyMemoryControl. The DRAMCtrl module should be used instead.
2016-12-05mem: Respond to InvalidateReq when the block is (pending) dirtyNikos Nikoleris
Previously when an InvalidateReq snooped a cache with a dirty block or a pending modified MSHR, it would invalidate the block or set the postInv flag. The cache would not send an InvalidateResp. though, causing memory order violations. This patches changes this behavior, making the cache with the dirty block or pending modified MSHR the ordering point. Change-Id: Ib4c31012f4f6693ffb137cd77258b160fbc239ca Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05mem: Invalidate a blk when servicing the 1st invalidating targetNikos Nikoleris
Previously an MSHR with one or more invalidating targets would first service all targets in the MSHR TargetList and then invalidate the block. As a result any service snooping targets would lookup in the cache and incorrectly find the block. This patch forces the invalidation to happen when the first invalidating target is encountered. Change-Id: I9df15de24e1d351cd96f5a2c424d9a03d81c2cce Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05mem: Allow non invalidating snoops on an InvalidateReq MSHRNikos Nikoleris
This patch changes an assertion that previously assumed that a non invalidating snoop request should never be serviced by an InvalidateReq MSHR. The MSHR serves as the ordering point for the snooping packet. When the InvalidateResp reaches the cache the snooping packet snoops the caches above to find the requested block. One or more of the caches above will have the block since earlier it has seen a WriteLineReq. Change-Id: I0c147c8b5d5019e18bd34adf9af0fccfe431ae07 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05mem: Don't use hasSharers in the snoopFilter for memory responsesNikos Nikoleris
When the snoopFilter receives a response, it updates its state using the hasSharers flag (indicates whether there are more than one copies of the block in the caches above). The hasSharers flag of the packet was previously populated when the request was traversing and snooping the caches looking for the block. 1) When the response is coming from the memory-side port, its order with respect to other responses is not necessarily preserved (e.g., a request that arrived second to the xbar can get its response first). As a result the snoopFilter might process responses out of order updating its residency information using the non valid hasSharers flag which was populated much earlier. 2) When the response is from an on-chip, the MSHRs preserve a well defined order and the hasSharers flag should contain valid information. This patch changes the snoopFilter by avoiding the hasSharers flag when the response is from the memory-side port. Change-Id: Ib2d22a5b7bf3eccac64445127d2ea20ee74bb25b Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05mem: Always use InvalidateReq to service WriteLineReq missesNikos Nikoleris
Previously, a WriteLineReq that missed in a cache would send out an InvalidateReq if the block lookup failed or an UpgradeReq if the block lookup succeeded but the block had sharers. This changes ensures that a WriteLineReq always sends an InvalidateReq to invalidate all copies of the block and satisfy the WriteLineReq. Change-Id: I207ff5b267663abf02bc0b08aeadde69ad81be61 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com>
2016-12-05mem: Assert that the responderHadWritable is set only onceNikos Nikoleris
Change-Id: Ie3beeef25331f84a0a5bcc17f7a791f4a829695b Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05mem: Ensure InvalidateReq is considered isForward by MSHRsAndreas Hansson
This patch fixes an issue where an MSHR would incorrectly be perceived to provide data to targets arriving after an InvalidateReq. To address this the InvalidateReq is now treated as isForward, much like an UpgradeReq that did not hit in the cache. Change-Id: Ia878444d949539b5c33fd19f3e12b0b8a872275e Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05mem: Make packet debug printing more uniformNikos Nikoleris
Previously DPRINTFs printing information about a packet would use ad hoc formats. This patch changes all DPRINTFs to use the print function defined by the packet class, making the packet printing format more uniform and easier to change. Change-Id: Idd436a9758d4bf70c86a574d524648b2a2580970 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05mem: Service only the 1st FromCPU MSHR target on ReadRespWithInvNikos Nikoleris
A response to a ReadReq can either be a ReadResp or a ReadRespWithInvalidate. As we add targets to an MSHR for a ReadReq we assume that the response will be a ReadResp. When the response is invalidating (ReadRespWithInvalidate) servicing more than one targets can potentially violate the memory ordering. This change fixes the way we handle a ReadRespWithInvalidate. When a cache receives a ReadRespWithInvalidate we service only the first FromCPU target and all the FromSnoop targets from the MSHR target list. The rest of the FromCPU targets are deferred and serviced by a new request. Change-Id: I75c30c268851987ee5f8644acb46f440b4eeeec2 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05mem: Keep track of allocOnFill in the TargetListNikos Nikoleris
Previously the information of whether a response was allocating or not was a property of the MSHR. This change makes this flag a property of the TargetList. Differernt TargetLists, e.g. the targets and the deferred targets lists might have different values. Additionally, the information about whether each of the target expects an allocating response is stored inside the TargetList container. This allows for repopulating the flag in case some of the targets are removed. Change-Id: If3ec2516992f42a6d9da907009ffe3ab8d0d2021 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-05mem: Add support for repopulating the flags of an MSHR TargetListNikos Nikoleris
This patch adds support for repopulating the flags of an MSHR TargetList. The added functionality makes it possible to remove targets from a TargetList without leaving it in an inconsistent state. Change-Id: I3f7a8e97bfd3e2e49bebad056d11bbfb087aad91 Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com>
2016-12-02ruby: Fix overflow reported by ASAN in MessageBuffer.Matthew Poremba
In MessageBuffer the m_not_avail_count member is incremented but not used. This causes an overflow reported by ASAN. This patch changes from an int to Stats::Scalar, since the count is useful in debugging finite MessageBuffers.
2016-11-30mem: Split the hit_latency into tag_latency and data_latencySophiane Senni
If the cache access mode is parallel, i.e. "sequential_access" parameter is set to "False", tags and data are accessed in parallel. Therefore, the hit_latency is the maximum latency between tag_latency and data_latency. On the other hand, if the cache access mode is sequential, i.e. "sequential_access" parameter is set to "True", tags and data are accessed sequentially. Therefore, the hit_latency is the sum of tag_latency plus data_latency. Signed-off-by: Jason Lowe-Power <jason@lowepower.com>
2016-11-21ruby: Fix potential bugs in garnet2.0Jieming Yin
1. Delete unused variable from struct LinkEntry 2. Correct GarnetExtLink and GarnetIntLink inheritance
2016-11-21ruby: add default ctor for MachineID typeTony Gutierrez
not all uses of MachineID initialize its fields, so here we add a default ctor.
2016-11-19ruby: init MessageSizeType of SequencerMsg to Request_ControlSooraj Puthoor
SequencerMsg is autogenerated by slicc scripts and the MessageSizeType is initialized to the max enume value by default. The DMASequencer pushes this message to the mandatory queue and since the MessageSizeType is unitialized, string_to_MessageSizeType() function used by traces to print the message fails with a panic. This patch avoids this problem by initializing MessageSizeType of SequencerMsg to Request_Control.
2016-10-26hsail,gpu-compute: fixes to appease clang++Tony Gutierrez
fixes to appease clang++. tested on: Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final) (based on LLVM 3.5.0) Ubuntu clang version 3.6.0-2ubuntu1~trusty1 (tags/RELEASE_360/final) (based on LLVM 3.6.0) the fixes address the following five issues: 1) the exec continuations in gpu_static_inst.hh were marked as protected when they should be public. here we mark them as public 2) the Abs instruction uses std::abs() in its execute method. because Abs is templated, it can also operate on U32 and U64, types, which cause Abs::execute() to pass uint32_t and uint64_t types to std::abs() respectively. this triggers a warning because std::abs() has no effect in this case. to rememdy this we add template specialization for the execute() method of Abs when its template paramter is U32 or U64. 3) Some potocols that utilize the code in cprintf.hh were missing includes to BoolVec.hh, which defines operator<< for the BoolVec type. This would cause issues when the generated code would try to pass a BoolVec type to a method in cprintf.hh that used operator<< on an instance of a BoolVec. 4) Surprise, clang doesn't like it when you clobber all the bits in a newly allocated object. I.e., this code: tlb = new GpuTlbEntry\[size\]; std::memset(tlb, 0, sizeof(GpuTlbEntry) \* size); Let's use std::vector to track the TLB entries in the GpuTlb now... 5) There were a few variables used only in DPRINTFs, so we mark them with M5_VAR_USED.
2016-10-26ruby: Allow multiple outstanding DMA requestsMichael LeBeane
DMA sequencers and protocols can currently only issue one DMA access at a time. This patch implements the necessary functionality to support multiple outstanding DMA requests in Ruby.
2016-10-26ruby: make a RequestDesc class instead of std::pairTony Gutierrez
the RequestDesc was previously implemented as a std::pair, which made the implementation overly complex and error prone. here we encapsulate the packet, primary, and secondary types all in a single data structure with all members properly intialized in a ctor
2016-10-13mem: add DRAM powerdown currentOmar Naji
Change-Id: I763cffe0c69f5ebbbf6a6eb12bec5c13d5d0161d Reviewed-by: Andreas Hansson <andreas.hansson@arm.com> Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13mem: Add DRAM low-power functionalityWendy Elsasser
Added power-down state transitions to the DRAM controller model. Added per rank parameter, outstandingEvents, which tracks the number of outstanding command events and is used to determine when the controller should transition to a low power state. The controller will only transition when there are no outstanding events scheduled and the number of command entries for the given rank is 0. The outstandingEvents parameter is incremented for every RD/WR burst, PRE, and REF event scheduled. ACT is implicitly covered by RD/WR since burst will always issue and complete after a required ACT. The parameter is decremented when the event is serviced (completed). The controller will automatically transition to ACT power down, PRE power down, or SREF. Transition to ACT power down state scheduled from: 1) The RespondEvent, where read data is received from the memory. ACT power-down entry will be scheduled when one or more banks is open, all commands for the rank have completed (no more commands scheduled), and there are no commands in queue for the rank Transition to PRE power down scheduled from: 1) respondEvent, when all banks are closed, all commands have completed, and there are no commands in queue for the rank 2) prechargeEvent when all banks are closed, all commands have completed, and there are no commands in queue for the rank 3) refreshEvent, after the refresh is complete when the previous state was ACT power-down 4) refreshEvent, after the refresh is complete when the previous state was PRE power-down and there are commands in the queue. Transition to SREF will be scheduled from: 1) refreshEvent, after the refresh is completes when the previous state was PRE power-down with no commands in queue Power-down exit commands are scheduled from: 1) The refreshEvent, prior to issuing a refresh 2) doDRAMAccess, to wake-up the rank for RD/WR command issue. Self-refresh exit commands are scheduled from: 1) The next request event, when the queue has commands for the rank in the readQueue or there are commands for the rank in the writeQueue and the bus state is WRITE. Change-Id: I6103f660776e36c686655e71d92ec7b5b752050a Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13mem: Add callback to compute stats prior to dump eventWendy Elsasser
The per rank statistics are periodically updated based on state transition and refresh events. Add a method to update these when a dump event occurs to ensure they reflect accurate values. Specifically, need to ensure that the low-power state durations, power, and energy are logged correctly. Change-Id: Ib642a6668340de8f494a608bb34982e58ba7f1eb Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13mem: Modify drain to ensure banks and power are idledWendy Elsasser
Add constraint that all ranks have to be in PWR_IDLE before signaling drain complete This will ensure that the banks are all closed and the rank has exited any low-power states. On suspend, update the power stats to sync the DRAM power logic The logic maintains the location of the signalDrainDone method, which is still triggered from either: 1) Read response event 2) Next request event This ensures that the drain will complete in the READ bus state and minimizes the changes required. Change-Id: If1476e631ea7d5999fe50a0c9379c5967a90e3d1 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13mem: Sort memory commands and update DRAMPowerWendy Elsasser
Add local variable to stores commands to be issued. These commands are in order within a single bank but will be out of order across banks & ranks. A new procedure, flushCmdList, sorts commands across banks / ranks, and flushes the sorted list, up to curTick() to DRAMPower. This is currently called in refresh, once all previous commands are guaranteed to have completed. Could be called in other events like the powerEvent as well. By only flushing commands up to curTick(), will not get out of sync when flushed at a periodic stats dump (done in subsequent patch). Change-Id: I4ac65a52407f64270db1e16a1fb04cfe7f638851 Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13mem: update DDR3 die revisionOmar Naji
Change-Id: I8992ddc1664c3ed4b2d36d8a34e4ce8be113b9de Reviewed-by: Radhika Jagtap <radhika.jagtap@arm.com>
2016-10-13mem: add DRAM powerdown timingOmar Naji
2016-10-13mem: make DDR4 x16Omar Naji
2016-10-06ruby: Add M5_VAR_USED before variables used only inside assert in garnet2.0.Tushar Krishna
This removes errors when building gem5.fast
2016-10-06ruby: garnet2.0Tushar Krishna
Revamped version of garnet with more optimized single-cycle routers, more configurability, and cleaner code.
2016-10-06ruby: remove the original garnet code.Tushar Krishna
Only garnet2.0 will be supported henceforth.
2016-10-06config: add port directions and per-router delay in topology.Tushar Krishna
This patch adds port direction names to the links during topology creation, which can be used for better printed names for the links or for users to code up their own adaptive routing algorithms. It also adds support for every router to have an independent latency value to support heterogeneous topologies with the subsequent garnet2.0 patch.
2016-10-06config: make internal links in network topology unidirectional.Tushar Krishna
This patch makes the internal links within the network topology unidirectional, thus allowing any deadlock-free routing algorithms to be specified from the topology itself using weights. This patch also renames Mesh.py and MeshDirCorners.py to Mesh_XY.py and MeshDirCorners_XY.py (Mesh with XY routing). It also adds a Mesh_westfirst.py and CrossbarGarnet.py topologies.
2016-10-06ruby: rename ALPHA_Network_test protocol to Garnet_standalone.Tushar Krishna
Over the past 6 years, we realized that the protocol is essentially used to run the garnet network in a standalone manner, and feed standard synthetic traffic patterns through it.
2016-09-29ruby: correct size for partial memory writesBrad Beckmann
Fixed AbstractController::queueMemoryWritePartial to specify the correct size for partial memory writes.
2016-09-29mem: minor dprintf fix to abstract memBrad Beckmann
print number of bytes written as a decimal number, not hex
2016-08-22cpu, mem, sim: Change how KVM maps memoryDavid Hashe
Only map memories into the KVM guest address space that are marked as usable by KVM. Create BackingStoreEntry class containing flags for is_conf_reported, in_addr_map, and kvm_map.
2016-08-15mem: Print an MSHR without triggering any assertionsNikos Nikoleris
Previously printing an mshr would trigger an assertion if the MSHR was not in service or if the targets list was empty. This patch changes the print function to bypasses the accessor functions for postInvalidate and postDowngrade and avoid the relevant assertions. It also checks if the targets list is empty before calling print on it. Change-Id: Ic18bee6cb088f63976112eba40e89501237cfe62 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com>
2016-08-12mem: Add support for secure packets in the snoop filterNikos Nikoleris
Secure and non-secure data can coexist in the cache and therefore the snoop filter should treat differently packets with secure and non secure accesses. This patch uses the lower bits of the line address to keep track of whether the packet is addressing secure memory or not. Change-Id: I54a5e614dad566a5083582bede86c86896f2c2c1 Reviewed-by: Andreas Sandberg <andreas.sandberg@arm.com> Reviewed-by: Stephan Diestelhorst <stephan.diestelhorst@arm.com> Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>
2016-08-12mem: Add snoop filter to SystemXBar by defaultAndreas Hansson
This patch changes the default behaviour of the SystemXBar, adding a snoop filter. With the recent updates to the snoop filter allocation behaviour this change no longer causes problems for the regressions without caches. Change-Id: Ibe0cd437b71b2ede9002384126553679acc69cc1 Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com> Reviewed-by: Jason Lowe-Power <jason@lowepower.com> Reviewed-by: Tony Gutierrez <anthony.gutierrez@amd.com>