summaryrefslogtreecommitdiff
path: root/ext/mcpat/README
diff options
context:
space:
mode:
Diffstat (limited to 'ext/mcpat/README')
-rw-r--r--ext/mcpat/README226
1 files changed, 226 insertions, 0 deletions
diff --git a/ext/mcpat/README b/ext/mcpat/README
new file mode 100644
index 000000000..4887b1037
--- /dev/null
+++ b/ext/mcpat/README
@@ -0,0 +1,226 @@
+ __ __ ____ _ _____ ____ _
+| \/ | ___| _ \ / \|_ _| | __ ) ___| |_ __ _
+| |\/| |/ __| |_) / _ \ | | | _ \ / _ \ __|/ _` |
+| | | | (__| __/ ___ \| | | |_) | __/ |_| (_| |
+|_| |_|\___|_| /_/ \_\_| |____/ \___|\__|\__,_|
+
+McPAT: Multicore Power, Area, and Timing
+Current version 0.8Beta
+===============================
+
+McPAT is an architectural modeling tool for chip multiprocessors (CMP)
+The main focus of McPAT is accurate power and area
+modeling, and a target clock rate is used as a design constraint.
+McPAT performs automatic extensive search to find optimal designs
+that satisfy the target clock frequency.
+
+For complete documentation of the McPAT, please refer McPAT 1.0
+technical report and the following paper,
+"McPAT: An Integrated Power, Area, and Timing Modeling
+ Framework for Multicore and Manycore Architectures",
+that appears in MICRO 2009. Please cite the paper, if you use
+McPAT in your work. The bibtex entry is provided below for your convenience.
+
+ @inproceedings{mcpat:micro,
+ author = {Sheng Li and Jung Ho Ahn and Richard D. Strong and Jay B. Brockman and Dean M. Tullsen and Norman P. Jouppi},
+ title = "{McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures}",
+ booktitle = {MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture},
+ year = {2009},
+ pages = {469--480},
+ }
+
+Current McPAT is in its beta release.
+List of features of beta release
+===============================
+The following are the list of features supported by the tool.
+
+* Power, area, and timing models for CMPs with:
+ Inorder cores both single and multithreaded
+ OOO cores both single and multithreaded
+ Shared/coherent caches with directory hardware:
+ including directory cache, shadowed tag directory
+ and static bank mapped tag directory
+ Network-on-Chip
+ On-chip memory controllers
+
+* Internal models are based on real modern processors:
+ Inorder models are based on Sun Niagara family
+ OOO models are based on Intel P6 for reservation
+ station based OOO cores, and on Intel Netburst and
+ Alpha 21264 for physical register file based OOO cores.
+
+* Leakage power modeling considers both sub-threshold leakage
+ and gate leakage power. The impact of operating temperature
+ on both leakage power are considered. Longer channel devices
+ that can reduce leakage significantly with modest performance
+ penalty are also modeled.
+
+* McPAT supports automatic extensive search to find optimal designs
+ that satisfy the target clock frequency. The timing constraint
+ include both throughput and latency.
+
+* Interconnect model with different delay, power, and area
+ properties, as well as both the aggressive and conservative
+ interconnect projections on wire technologies.
+
+* All process specific values used by the McPAT are obtained
+ from ITRS and currently, the McPAT supports 90nm, 65nm, 45nm,
+ 32nm, and 22nm technology nodes. At 32nm and 22nm nodes, SOI
+ and DG devices are used. After 45nm, Hi-K metal gates are used.
+
+How to use the tool?
+====================
+
+McPAT takes input parameters from an XML-based interface,
+then it computes area and peak power of the
+Please note that the peak power is the absolute worst case power,
+which could be even higher than TDP.
+
+1. Steps to run McPAT:
+ -> define the target processor using inorder.xml or OOO.xml
+ -> run the "mcpat" binary:
+ ./mcpat -infile <*.xml> -print_level < level of detailed output>
+ ./mcpat -h (or mcpat --help) will show the quick help message.
+
+ Rather than being hardwired to certain simulators, McPAT
+ uses an XML-based interface to enable easy integration
+ with various performance simulators. Our collaborator,
+ Richard Strong, at University of California, San Diego,
+ designed an experimental parser for the M5 simulator, aiming for
+ streamlining the integration of McPAT and M5. Please check the M5
+ repository/ for the latest version of the parser.
+
+2. Optimize:
+ McPAT will try its best to satisfy the target clock rate.
+ When it cannot find a valid solution, it gives out warnings,
+ while still giving a solution that is closest to the timing
+ constraints and calculate power based on it. The optimization
+ will lead to larger power/area numbers for target higher clock
+ rate. McPAT also provides the option "-opt_for_clk" to turn on
+ ("-opt_for_clk 1") and off this strict optimization for the
+ timing constraint. When it is off, McPAT always optimize
+ component for ED^2P without worrying about meeting the
+ target clock frequency. By turning it off, the computation time
+ can be reduced, which suites for situations where target clock rate
+ is conservative.
+
+3. The output:
+ McPAT outputs results in a hierarchical manner. Increasing
+ the "-print_level" will show detailed results inside each
+ component. For each component, major parts are shown, and associated
+ pipeline registers/control logic are added up in total area/power of each
+ components. In general, McPAT does not model the area/overhead of the pad
+ frame used in a processor die.
+
+4. How to use the XML interface for McPAT
+ 4.1 Set up the parameters
+ Parameters of target designs need to be set in the *.xml file for
+ entries taged as "param". McPAT have very detailed parameter settings.
+ please remove the structure parameter from the file if you want
+ to use the default values. Otherwise, the parameters in the xml file
+ will override the default values.
+
+ 4.2 Pass the statistics
+ There are two options to get the correct stats: a) the performance
+ simulator can capture all the stats in detail and pass them to McPAT;
+ b). Performance simulator can only capture partial stats and pass
+ them to McPAT, while McPAT can reason about the complete stats using
+ the partial information and the configuration. Therefore, there are
+ some overlap for the stats.
+
+ 4.3 Interface XML file structures (PLEASE READ!)
+ The XML is hierarchical from processor level to micro-architecture
+ level. McPAT support both heterogeneous and homogeneous manycore processors.
+
+ 1). For heterogeneous processor setup, each component (core, NoC, cache,
+ and etc) must have its own instantiations (core0, core1, ..., coreN).
+ Each instantiation will have different parameters as well as its stats.
+ Thus, the XML file must have multiple "instantiation" of each type of
+ heterogeneous components and the corresponding hetero flags must be set
+ in the XML file. Then state in the XML should be the stats of "a" instantiation
+ (e.g. "a" cores). The reported runtime dynamic is of a single instantiation
+ (e.g. "a" cores). Since the stats for each (e.g. "a" cores) may be different,
+ we will see a whole list of (e.g. "a" cores) with different dynamic power,
+ and total power is just a sum of them.
+
+ 2). For homogeneous processors, the same method for heterogeneous can
+ also be used by treating all homogeneous instantiations as heterogeneous.
+ However, a preferred approach is to use a single representative for all
+ the same components (e.g. core0 to represent all cores) and set the
+ processor to have homogeneous components (e.g. <param name="homogeneous_cores
+ " value="1"/> ). Thus, the XML file only has one instantiation to represent
+ all others with the same architectural parameters. The corresponding homo
+ flags must be set in the XML file. Then, the stats in the XML should be
+ the aggregated stats of the sum of all instantiations (e.g. aggregated stats
+ of all cores). In the final results, McPAT will only report a single
+ instantiation of each type of component, and the reported runtime dynamic power
+ is the sum of all instantiations of the same type. This approach can run fast
+ and use much less memory.
+
+5. Guide for integrating McPAT into performance simulators and bypassing the XML interface
+ The detailed work flow of McPAT has two phases: the initialization phase and
+ the computation phase. Specifically, in order to start the initialization phase a
+ user specifies static configurations, including parameters at all three levels,
+ namely, architectural, circuit, and technology levels. During the initialization
+ phase, McPAT will generate the internal chip representation using the configurations
+ set by the user.
+ The computation phase of McPAT is called by McPAT or the performance simulator
+ during simulation to generate runtime power numbers. Before calling McPAT to
+ compute runtime power numbers, the performance simulator needs to pass the
+ statistics, namely, the activity factors of each individual components to McPAT
+ via the XML interface.
+ The initialization phase is very time-consuming, since it will repeat many
+ times until valid configurations are found or the possible configurations are
+ exhausted. To reduce the overhead, a user can let the simulator to call McPAT
+ directly for computation phase and only call initialization phase once at the
+ beginning of simulation. In this case, the XML interface file is bypassed,
+ please refer to processor.cc to see how the two phases are called.
+
+6. Sample input files:
+ This package provide sample XML files for validating target processors. Please find the
+ enclosed Niagara1.xml (for the Sun Niagara1 processor), Niagara2.xml (for the Sun Niagara2
+ processor), Alpha21364.xml (for the Alpha21364 processor), and Xeon.xml (for the Intel
+ Xeon Tulsa processor).
+
+ Special instructions for using Xeon.xml:
+ McPAT uses ITRS device types including HP, LSTP, and LOP. Although most
+ designs follow ITRS projections, there are designs with special technologies.
+ For example, the 65nm Xeon Tulsa processor uses 1.25 V rather than 1.1V
+ for the core voltage domain, which results in the changes in threshold voltage,
+ leakage current density, saturation current, and etc, besides the different
+ supply voltage. We use MASTAR to match the special technology as used in Xeon
+ core domain. Therefore, in order to generate accurate results of Xeon
+ Tulsa cores, users need to do make TAR=mcpatXeonCore and use the generated
+ special executable. The L3 cache and buses must be computed using standard
+ ITRS technology.
+
+
+====================
+McPAT is in its beginning stage. We are still improving
+the tool and refining the code. Please come back to its website
+for newer versions. If you have any comments,
+questions, or suggestions, please write to us.
+
+Version history and roadmap
+
+McPAT Alpha: released Sep. 2009 Experimental release
+McPAT Beta (0.6): released Nov. 2009 New code base and technology base
+McPAT Beta (0.7): released May. 2010 Added various new models,
+ including long channel devices, buses model; together
+ with bug fixes and extensive code optimization to reduce
+ memory usage.
+McPAT Beta (0.8): released Aug. 2010 Added various new models,
+ including on-chip 10Gb ethernet units, PCIe, and flash controllers.
+Next major release:
+McPAT 1.0: including advance power-saving states
+
+Future releases may include the modeling of embedded low-power
+processors as well as vector processors and GPGPUs.
+
+
+Sheng Li
+sheng.li@hp.com
+
+
+
+