summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-06-30mem: DRAMPower trace formatting scriptAndreas Hansson
This patch adds a first version of a script that processes the debug output and generates a command trace for DRAMPower. This is work in progress and is intended as a snapshot of ongoing work at this point. The longer term plan is to link in DRAMPower as a library and have one instance of the model per rank, and instantiate it based on a struct passed from gem5. Each command will then be a call to the model and no parsing of traces will be necessary.
2014-06-30mem: DRAMPower trace outputAndreas Hansson
This patch adds a DRAMPower flag to enable off-line DRAM power analysis using the DRAMPower tool. A new DRAMPower flag is added and a follow-on patch adds a Python script to post-process the output and order it based on time stamps. The long-term goal is to link DRAMPower as a library and provide the commands through function calls to the model rather than first printing and then parsing the commands. At the moment it is also up to the user to ensure that the same DRAM configuration is used by the gem5 controller model and DRAMPower.
2014-06-30mem: Add bank and rank indices as fields to the DRAM bankAndreas Hansson
This patch adds the index of the bank and rank as a field so that we can determine the identity of a given bank (reference or pointer) for the power tracing. We also grab the opportunity of cleaning up the arguments used for identifying the bank when activating.
2014-06-30mem: Extend DRAM row bits from 16 to 32 for larger densitiesAndreas Hansson
This patch extends the DRAM row bits to 32 to support larger density memories. Additional checks are also added to ensure the row fits in the 32 bits.
2014-06-30cpu: implement a bi-mode branch predictorAnthony Gutierrez
2014-06-30arm: make the bi-mode predictor the default for O3_ARM_v7a_BPAnthony Gutierrez
the branch predictor used in the Cortex-A15 is a bi-mode style predictor, see: http://arm.com/files/pdf/at-exploring_the_design_of_the_cortex-a15.pdf and http://nvidia.com/docs/IO/116757/NVIDIA_Quad_a15_whitepaper_FINALv2.pdf this patch makes the bi-mode predictor the default for the ARM O3 CPU.
2014-06-22stats: update for O3 changesSteve Reinhardt
Mostly small differences in total ticks, but O3 stall causes shifted significantly. 30.eon does speed up by ~6% on Alpha and ARM, and 50.vortex by 4.5% on ARM. At the other extreme, X86 70.twolf is 0.8% slower.
2014-06-21x86: fix table walker assertionBinh Pham
In a cycle, we could see a R and W requests corresponding to the same page walk being sent to the memory. During the cycle that assertion happens, we have 2 responses corresponding to the R and W above. We also have a 'read' variable to keep track of the inflight Read request, this gets reset to NULL right after we send out any R request; and gets set to the next R in the page walk when a response comes back. The issue we are seeing here is when we get a response for W request, assert(!read) fires because we got a response for R request right before this, hence we set 'read' to NOT NULL value, pointing to the next R request in the pagewalk! This work was done while Binh was an intern at AMD Research.
2014-06-21o3: make dispatch LSQ full check more selectiveBinh Pham
Dispatch should not check LSQ size/LSQ stall for non load/store instructions. This work was done while Binh was an intern at AMD Research.
2014-06-21o3: split load & store queue full cases in renameBinh Pham
Check for free entries in Load Queue and Store Queue separately to avoid cases when load cannot be renamed due to full Store Queue and vice versa. This work was done while Binh was an intern at AMD Research.
2014-06-10scons: Bump the compiler version to gcc 4.6 and clang 3.0Andreas Hansson
This patch bumps the supported version of gcc from 4.4 to 4.6, and clang from 2.9 to 3.0. This enables, amongst other things, range-based for loops, lambda expressions, etc. The STL implementation shipping with 4.6 also has a full functional implementation of unique_ptr and shared_ptr.
2014-06-09Util: Do not style check symlinksJoel Hestness
The style checker used to traverse symlinks if they pointed to files, which can result in style checker failure if the pointed-to file doesn't exist. This style check is actually unnecessary, since symlinks either point to other files that are already style checked, or files outside gem5, which shouldn't be checked. Skip symlinks.
2014-06-09sim: More rigorous clocking commentsJoel Hestness
The language describing the clockEdge and nextCycle functions were ambiguous, and so were prone to misinterpretation/misuse. Clear up the comments to more rigorously describe their functionality.
2014-06-04ext: Add a McPAT regression testerYasuko Eckert
Add a regression tester to McPAT. Joel Hestness wrote these tests and Yasuko Eckert modified them to reflect the new McPAT interface and other changes the previous patch made.
2014-06-03ext: McPAT interface changes and fixesYasuko Eckert
This patch includes software engineering changes and some generic bug fixes Joel Hestness and Yasuko Eckert made to McPAT 0.8. There are still known issues/concernts we did not have a chance to address in this patch. High-level changes in this patch include: 1) Making XML parsing modular and hierarchical: - Shift parsing responsibility into the components - Read XML in a (mostly) context-free recursive manner so that McPAT input files can contain arbitrary component hierarchies 2) Making power, energy, and area calculations a hierarchical and recursive process - Components track their subcomponents and recursively call compute functions in stages - Make C++ object hierarchy reflect inheritance of classes of components with similar structures - Simplify computeArea() and computeEnergy() functions to eliminate successive calls to calculate separate TDP vs. runtime energy - Remove Processor component (now unnecessary) and introduce a more abstract System component 3) Standardizing McPAT output across all components - Use a single, common data structure for storing and printing McPAT output - Recursively call print functions through component hierarchy 4) For caches, allow splitting data array and tag array reads and writes for better accuracy 5) Improving the usability of CACTI by printing more helpful warning and error messages 6) Minor: Impose more rigorous code style for clarity (more work still to be done) Overall, these changes greatly reduce the amount of replicated code, and they improve McPAT runtime and decrease memory footprint.
2014-06-03ext: change McPAT to not force compile in 32-bit mode.Yasuko Eckert
2014-06-03ext: Redirect McPAT object filesYasuko Eckert
All object files and McPAT binaries are moved to directory gem5/build/mcpat/ rather than creating them locally.
2014-05-31style: eliminate equality tests with true and falseSteve Reinhardt
Using '== true' in a boolean expression is totally redundant, and using '== false' is pretty verbose (and arguably less readable in most cases) compared to '!'. It's somewhat of a pet peeve, perhaps, but I had some time waiting for some tests to run and decided to clean these up. Unfortunately, SLICC appears not to have the '!' operator, so I had to leave the '== false' tests in the SLICC code.
2014-05-24stats: changes due to recent o3 patch.Nilay Vaish
2014-05-23stats: changes due to o3 cpu and ruby message buffer patchesNilay Vaish
2014-05-23ruby: slicc: remove unused ids DNUCA*Nilay Vaish
2014-05-23ruby: remove old protocol documentationNilay Vaish
2014-05-23ruby: message buffer: drop dequeue_getDelayCycles()Nilay Vaish
The functionality of updating and returning the delay cycles would now be performed by the dequeue() function itself.
2014-05-23cpu: o3: remove stat totalCommittedInstsNilay Vaish
This patch removes the stat totalCommittedInsts. This variable was used for recording the total number of instructions committed across all the threads of a core. The instructions committed by each thread are recorded invidually. The total would now be generated by summing these individual counts.
2014-05-15config: remove unecessary assignment of etherlink interfacesAnthony Gutierrez
in makeDualRoot() the etherlink interfaces are set using the tsunami interface however, they are set again a few lines later based on whether or not the system is a realview or tsunami system; the original assignment is always overwritten or there will be a fatal. this seems like an artifact from when tsunami was the only type of system capable of running with the dual option.
2014-05-12syscall emulation: clean up & comment SyscallReturnSteve Reinhardt
2014-05-12tests: update t1000 & pc-switcheroo-full statsSteve Reinhardt
committed reference config.json files too
2014-05-10tests: update eio ref outputs for new statsSteve Reinhardt
Also committed reference config.json files for the eio tests.
2014-05-09stats: Bump stats for the fixes, and mostly DRAM controller changesAndreas Hansson
2014-05-09config: Bump DRAM sweep bus speed to match DDR4 configAndreas Hansson
This patch bumps the bus clock speed such that the interconnect does not become a bottleneck with a DDR4-2400-x64 DRAM delivering 19.2 GByte/s theoretical max.
2014-05-09tests: Reflect name change in DRAM testsAndreas Hansson
This patch reflects the recent name change in the DRAM TrafficGen tests and also tidies up the test directory. --HG-- rename : tests/configs/tgen-simple-dram.py => tests/configs/tgen-dram-ctrl.py rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/config.ini => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/config.ini rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/simerr => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/simerr rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/simout => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/simout rename : tests/quick/se/70.tgen/ref/null/none/tgen-simple-dram/stats.txt => tests/quick/se/70.tgen/ref/null/none/tgen-dram-ctrl/stats.txt rename : tests/quick/se/70.tgen/tgen-simple-dram.cfg => tests/quick/se/70.tgen/tgen-dram-ctrl.cfg
2014-05-09mem: Update DDR3 and DDR4 based on datasheetsAndreas Hansson
This patch makes a more firm connection between the DDR3-1600 configuration and the corresponding datasheet, and also adds a DDR3-2133 and a DDR4-2400 configuration. At the moment there is also an ongoing effort to align the choice of datasheets to what is available in DRAMPower.
2014-05-09mem: Add DRAM cycle timeAndreas Hansson
This patch extends the current timing parameters with the DRAM cycle time. This is needed as the DRAMPower tool expects timestamps in DRAM cycles. At the moment we could get away with doing this in a post-processing step as the DRAMPower execution is separate from the simulation run. However, in the long run we want the tool to be called during the simulation, and then the cycle time is needed.
2014-05-09mem: Simplify DRAM response schedulingAndreas Hansson
This patch simplifies the DRAM response scheduling based on the assumption that they are always returned in order.
2014-05-09mem: Add precharge all (PREA) to the DRAM controllerAndreas Hansson
This patch adds the basic ingredients for a precharge all operation, to be used in conjunction with DRAM power modelling. Currently we do not try and apply any cleverness when precharging all banks, thus even if only a single bank is open we use PREA as opposed to PRE. At the moment we only have a single tRP (tRPpb), and do not model the slightly longer all-bank precharge constraint (tRPab).
2014-05-09mem: Remove printing of DRAM paramsAndreas Hansson
This patch removes the redundant printing of DRAM params.
2014-05-09mem: Add tRTP to the DRAM controllerAndreas Hansson
This patch adds the tRTP timing constraint, governing the minimum time between a read command and a precharge. Default values are provided for the existing DRAM types.
2014-05-09mem: Merge DRAM latency calculation and bank state updateAndreas Hansson
This patch merges the two control paths used to estimate the latency and update the bank state. As a result of this merging the computation is now in one place only, and should be easier to follow as it is all done in absolute (rather than relative) time. As part of this change, the scheduling is also refined to ensure that we look at a sensible estimate of the bank ready time in choosing the next request. The bank latency stat is removed as it ends up being misleading when the DRAM access code gets evaluated ahead of time (due to the eagerness of waking the model up for scheduling the next request).
2014-05-09mem: Add tWR to DRAM activate and precharge constraintsAndreas Hansson
This patch adds the write recovery time to the DRAM timing constraints, and changes the current tRASDoneAt to a more generic preAllowedAt, capturing when a precharge is allowed to take place. The part of the DRAM access code that accounts for the precharge and activate constraints is updated accordingly.
2014-05-09mem: Merge DRAM page-management calculationsAndreas Hansson
This patch treats the closed page policy as yet another case of auto-precharging, and thus merges the code with that used for the other policies.
2014-05-09mem: Add DRAM power states to the controllerAndreas Hansson
This patch adds power states to the controller. These states and the transitions can be used together with the Micron power model. As a more elaborate use-case, the transitions can be used to drive the DRAMPower tool. At the moment, the power-down modes are not used, and this patch simply serves to capture the idle, auto refresh and active modes. The patch adds a third state machine that interacts with the refresh state machine.
2014-05-09mem: Ensure DRAM refresh respects timingsAndreas Hansson
This patch adds a state machine for the refresh scheduling to ensure that no accesses are allowed while the refresh is in progress, and that all banks are propely precharged. As part of this change, the precharging of banks of broken out into a method of its own, making is similar to how activations are dealt with. The idle accounting is also updated to ensure that the refresh duration is not added to the time that the DRAM is in the idle state with all banks precharged.
2014-05-09mem: Make DRAM read/write switching less conservativeAndreas Hansson
This patch changes the read/write event loop to use a single event (nextReqEvent), along with a state variable, thus joining the two control flows. This change makes it easier to follow the state transitions, and control what happens when. With the new loop we modify the overly conservative switching times such that the write-to-read switch allows bank preparation to happen in parallel with the bus turn around. Similarly, the read-to-write switch uses the introduced tRTW constraint.
2014-04-17arm: Make sure UndefinedInstructions are properly initializedAli Saidi
2014-04-17arm: allow DC instructions by default so SE mode worksAli Saidi
2014-04-17sim, arm: implement more of the at variety syscallsAli Saidi
Needed for new AArch64 binaries
2014-05-09cpu: Useful getters for ActivityRecorderAndrew Bardsley
Add some useful getters to ActivityRecorder
2014-05-09cpu: Add flag name printing to StaticInstAndrew Bardsley
This patch adds a the member function StaticInst::printFlags to allow all of an instruction's flags to be printed without using the individual is... member functions or resorting to exposing the 'flags' vector It also replaces the enum definition StaticInst::Flags with a Python-generated enumeration and adds to the enum generation mechanism in src/python/m5/params.py to allow Enums to be placed in namespaces other than Enums or, alternatively, in wrapper structs allowing them to be inherited by other classes (so populating that class's name-space with the enumeration element names).
2014-05-09cpu: Timebuf const accessorsAndrew Bardsley
Add const accessors for timebuf elements.
2014-05-09arm: Add branch flags onto macroopsAndrew Bardsley
Mark branch flags onto macroops to allow branch prediction before microop decomposition