summaryrefslogtreecommitdiff
path: root/src/doc/inside-minor.doxygen
blob: e55f61c01df8f9b355bf83c208f13b92dfacbf05 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
# Copyright (c) 2014 ARM Limited
# All rights reserved
#
# The license below extends only to copyright in the software and shall
# not be construed as granting a license to any other intellectual
# property including but not limited to intellectual property relating
# to a hardware implementation of the functionality of the software
# licensed hereunder.  You may use the software subject to the license
# terms below provided that you ensure that this notice is replicated
# unmodified and in its entirety in all distributions of the software,
# modified or unmodified, in source code or in binary form.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met: redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer;
# redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution;
# neither the name of the copyright holders nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# Authors: Andrew Bardsley

namespace Minor
{

/*!

\page minor Inside the Minor CPU model

\tableofcontents

This document contains a description of the structure and function of the
Minor gem5 in-order processor model.  It is recommended reading for anyone who
wants to understand Minor's internal organisation, design decisions, C++
implementation and Python configuration.  A familiarity with gem5 and some of
its internal structures is assumed.  This document is meant to be read
alongside the Minor source code and to explain its general structure without
being too slavish about naming every function and data type.

\section whatis What is Minor?

Minor is an in-order processor model with a fixed pipeline but configurable
data structures and execute behaviour.  It is intended to be used to model
processors with strict in-order execution behaviour and allows visualisation
of an instruction's position in the pipeline through the
MinorTrace/minorview.py format/tool.  The intention is to provide a framework
for micro-architecturally correlating the model with a particular, chosen
processor with similar capabilities.

\section philo Design philosophy

\subsection mt Multithreading

The model isn't currently capable of multithreading but there are THREAD
comments in key places where stage data needs to be arrayed to support
multithreading.

\subsection structs Data structures

Decorating data structures with large amounts of life-cycle information is
avoided.  Only instructions (MinorDynInst) contain a significant proportion of
their data content whose values are not set at construction.

All internal structures have fixed sizes on construction.  Data held in queues
and FIFOs (MinorBuffer, FUPipeline) should have a BubbleIF interface to
allow a distinct 'bubble'/no data value option for each type.

Inter-stage 'struct' data is packaged in structures which are passed by value.
Only MinorDynInst, the line data in ForwardLineData and the memory-interfacing
objects Fetch1::FetchRequest and LSQ::LSQRequest are '::new' allocated while
running the model.

\section model Model structure

Objects of class MinorCPU are provided by the model to gem5.  MinorCPU
implements the interfaces of (cpu.hh) and can provide data and
instruction interfaces for connection to a cache system.  The model is
configured in a similar way to other gem5 models through Python.  That
configuration is passed on to MinorCPU::pipeline (of class Pipeline) which
actually implements the processor pipeline.

The hierarchy of major unit ownership from MinorCPU down looks like this:

<ul>
<li>MinorCPU</li>
<ul>
    <li>Pipeline - container for the pipeline, owns the cyclic 'tick'
    event mechanism and the idling (cycle skipping) mechanism.</li>
    <ul>
        <li>Fetch1 - instruction fetch unit responsible for fetching cache
            lines (or parts of lines from the I-cache interface)</li>
        <ul>
            <li>Fetch1::IcachePort - interface to the I-cache from
                Fetch1</li>
            </ul>
            <li>Fetch2 - line to instruction decomposition</li>
            <li>Decode - instruction to micro-op decomposition</li>
            <li>Execute - instruction execution and data memory
                interface</li>
            <ul>
                <li>LSQ - load store queue for memory ref. instructions</li>
                <li>LSQ::DcachePort - interface to the D-cache from
                    Execute</li>
            </ul>
        </ul>
    </ul>
</ul>

\section keystruct Key data structures

\subsection ids Instruction and line identity: InstId (dyn_inst.hh)

An InstId contains the sequence numbers and thread numbers that describe the
life cycle and instruction stream affiliations of individual fetched cache
lines and instructions.

An InstId is printed in one of the following forms:

    - T/S.P/L - for fetched cache lines
    - T/S.P/L/F - for instructions before Decode
    - T/S.P/L/F.E - for instructions from Decode onwards

for example:

    - 0/10.12/5/6.7

InstId's fields are:

<table>
<tr>
    <td><b>Field</b></td>
    <td><b>Symbol</b></td>
    <td><b>Generated by</b></td>
    <td><b>Checked by</b></td>
    <td><b>Function</b></td>
</tr>

<tr>
    <td>InstId::threadId</td>
    <td>T</td>
    <td>Fetch1</td>
    <td>Everywhere the thread number is needed</td>
    <td>Thread number (currently always 0).</td>
</tr>

<tr>
    <td>InstId::streamSeqNum</td>
    <td>S</td>
    <td>Execute</td>
    <td>Fetch1, Fetch2, Execute (to discard lines/insts)</td>
    <td>Stream sequence number as chosen by Execute.  Stream
        sequence numbers change after changes of PC (branches, exceptions) in
        Execute and are used to separate pre and post branch instruction
        streams.</td>
</tr>

<tr>
    <td>InstId::predictionSeqNum</td>
    <td>P</td>
    <td>Fetch2</td>
    <td>Fetch2 (while discarding lines after prediction)</td>
    <td>Prediction sequence numbers represent branch prediction decisions.
    This is used by Fetch2 to mark lines/instructions according to the last
    followed branch prediction made by Fetch2.  Fetch2 can signal to Fetch1
    that it should change its fetch address and mark lines with a new
    prediction sequence number (which it will only do if the stream sequence
    number Fetch1 expects matches that of the request).  </td> </tr>

<tr>
<td>InstId::lineSeqNum</td>
<td>L</td>
<td>Fetch1</td>
<td>(Just for debugging)</td>
<td>Line fetch sequence number of this cache line or the line
    this instruction was extracted from.
    </td>
</tr>

<tr>
<td>InstId::fetchSeqNum</td>
<td>F</td>
<td>Fetch2</td>
<td>Fetch2 (as the inst. sequence number for branches)</td>
<td>Instruction fetch order assigned by Fetch2 when lines
    are decomposed into instructions.
    </td>
</tr>

<tr>
<td>InstId::execSeqNum</td>
<td>E</td>
<td>Decode</td>
<td>Execute (to check instruction identity in queues/FUs/LSQ)</td>
<td>Instruction order after micro-op decomposition.</td>
</tr>

</table>

The sequence number fields are all independent of each other and although, for
instance, InstId::execSeqNum for an instruction will always be >=
InstId::fetchSeqNum, the comparison is not useful.

The originating stage of each sequence number field keeps a counter for that
field which can be incremented in order to generate new, unique numbers.

\subsection insts Instructions: MinorDynInst (dyn_inst.hh)

MinorDynInst represents an instruction's progression through the pipeline.  An
instruction can be three things:

<table>
<tr>
    <td><b>Thing</b></td>
    <td><b>Predicate</b></td>
    <td><b>Explanation</b></td>
</tr>
<tr>
    <td>A bubble</td>
    <td>MinorDynInst::isBubble()</td>
    <td>no instruction at all, just a space-filler</td>
</tr>
<tr>
    <td>A fault</td>
    <td>MinorDynInst::isFault()</td>
    <td>a fault to pass down the pipeline in an instruction's clothing</td>
</tr>
<tr>
    <td>A decoded instruction</td>
    <td>MinorDynInst::isInst()</td>
    <td>instructions are actually passed to the gem5 decoder in Fetch2 and so
    are created fully decoded.  MinorDynInst::staticInst is the decoded
    instruction form.</td>
</tr>
</table>

Instructions are reference counted using the gem5 RefCountingPtr
(base/refcnt.hh) wrapper.  They therefore usually appear as MinorDynInstPtr in
code.  Note that as RefCountingPtr initialises as nullptr rather than an
object that supports BubbleIF::isBubble, passing raw MinorDynInstPtrs to
Queue%s and other similar structures from stage.hh without boxing is
dangerous.

\subsection fld ForwardLineData (pipe_data.hh)

ForwardLineData is used to pass cache lines from Fetch1 to Fetch2.  Like
MinorDynInst%s, they can be bubbles (ForwardLineData::isBubble()),
fault-carrying or can contain a line (partial line) fetched by Fetch1.  The
data carried by ForwardLineData is owned by a Packet object returned from
memory and is explicitly memory managed and do must be deleted once processed
(by Fetch2 deleting the Packet).

\subsection fid ForwardInstData (pipe_data.hh)

ForwardInstData can contain up to ForwardInstData::width() instructions in its
ForwardInstData::insts vector.  This structure is used to carry instructions
between Fetch2, Decode and Execute and to store input buffer vectors in Decode
and Execute.

\subsection fr Fetch1::FetchRequest (fetch1.hh)

FetchRequests represent I-cache line fetch requests.  The are used in the
memory queues of Fetch1 and are pushed into/popped from Packet::senderState
while traversing the memory system.

FetchRequests contain a memory system Request (mem/request.hh) for that fetch
access, a packet (Packet, mem/packet.hh), if the request gets to memory, and a
fault field that can be populated with a TLB-sourced prefetch fault (if any).

\subsection lsqr LSQ::LSQRequest (execute.hh)

LSQRequests are similar to FetchRequests but for D-cache accesses.  They carry
the instruction associated with a memory access.

\section pipeline The pipeline

\verbatim
------------------------------------------------------------------------------
    Key:

    [] : inter-stage BufferBuffer
    ,--.
    |  | : pipeline stage
    `--'
    ---> : forward communication
    <--- : backward communication

    rv : reservation information for input buffers

                ,------.     ,------.     ,------.     ,-------.
 (from  --[]-v->|Fetch1|-[]->|Fetch2|-[]->|Decode|-[]->|Execute|--> (to Fetch1
 Execute)    |  |      |<-[]-|      |<-rv-|      |<-rv-|       |     & Fetch2)
             |  `------'<-rv-|      |     |      |     |       |
             `-------------->|      |     |      |     |       |
                             `------'     `------'     `-------'
------------------------------------------------------------------------------
\endverbatim

The four pipeline stages are connected together by MinorBuffer FIFO
(stage.hh, derived ultimately from TimeBuffer) structures which allow
inter-stage delays to be modelled.  There is a MinorBuffer%s between adjacent
stages in the forward direction (for example: passing lines from Fetch1 to
Fetch2) and, between Fetch2 and Fetch1, a buffer in the backwards direction
carrying branch predictions.

Stages Fetch2, Decode and Execute have input buffers which, each cycle, can
accept input data from the previous stage and can hold that data if the stage
is not ready to process it.  Input buffers store data in the same form as it
is received and so Decode and Execute's input buffers contain the output
instruction vector (ForwardInstData (pipe_data.hh)) from their previous stages
with the instructions and bubbles in the same positions as a single buffer
entry.

Stage input buffers provide a Reservable (stage.hh) interface to their
previous stages, to allow slots to be reserved in their input buffers, and
communicate their input buffer occupancy backwards to allow the previous stage
to plan whether it should make an output in a given cycle.

\subsection events Event handling: MinorActivityRecorder (activity.hh,
pipeline.hh)

Minor is essentially a cycle-callable model with some ability to skip cycles
based on pipeline activity.  External events are mostly received by callbacks
(e.g. Fetch1::IcachePort::recvTimingResp) and cause the pipeline to be woken
up to service advancing request queues.

Ticked (sim/ticked.hh) is a base class bringing together an evaluate
member function and a provided SimObject.  It provides a Ticked::start/stop
interface to start and pause clock events from being periodically issued.
Pipeline is a derived class of Ticked.

During evaluate calls, stages can signal that they still have work to do in
the next cycle by calling either MinorCPU::activityRecorder->activity() (for
non-callable related activity) or MinorCPU::wakeupOnEvent(<stageId>) (for
stage callback-related 'wakeup' activity).

Pipeline::evaluate contains calls to evaluate for each unit and a test for
pipeline idling which can turns off the clock tick if no unit has signalled
that it may become active next cycle.

Within Pipeline (pipeline.hh), the stages are evaluated in reverse order (and
so will ::evaluate in reverse order) and their backwards data can be
read immediately after being written in each cycle allowing output decisions
to be 'perfect' (allowing synchronous stalling of the whole pipeline).  Branch
predictions from Fetch2 to Fetch1 can also be transported in 0 cycles making
fetch1ToFetch2BackwardDelay the only configurable delay which can be set as
low as 0 cycles.

The MinorCPU::activateContext and MinorCPU::suspendContext interface can be
called to start and pause threads (threads in the MT sense) and to start and
pause the pipeline.  Executing instructions can call this interface
(indirectly through the ThreadContext) to idle the CPU/their threads.

\subsection stages Each pipeline stage

In general, the behaviour of a stage (each cycle) is:

\verbatim
    evaluate:
        push input to inputBuffer
        setup references to input/output data slots

        do 'every cycle' 'step' tasks

        if there is input and there is space in the next stage:
            process and generate a new output
            maybe re-activate the stage

        send backwards data

        if the stage generated output to the following FIFO:
            signal pipe activity

        if the stage has more processable input and space in the next stage:
            re-activate the stage for the next cycle

        commit the push to the inputBuffer if that data hasn't all been used
\endverbatim

The Execute stage differs from this model as its forward output (branch) data
is unconditionally sent to Fetch1 and Fetch2.  To allow this behaviour, Fetch1
and Fetch2 must be unconditionally receptive to that data.

\subsection fetch1 Fetch1 stage

Fetch1 is responsible for fetching cache lines or partial cache lines from the
I-cache and passing them on to Fetch2 to be decomposed into instructions.  It
can receive 'change of stream' indications from both Execute and Fetch2 to
signal that it should change its internal fetch address and tag newly fetched
lines with new stream or prediction sequence numbers.  When both Execute and
Fetch2 signal changes of stream at the same time, Fetch1 takes Execute's
change.

Every line issued by Fetch1 will bear a unique line sequence number which can
be used for debugging stream changes.

When fetching from the I-cache, Fetch1 will ask for data from the current
fetch address (Fetch1::pc) up to the end of the 'data snap' size set in the
parameter fetch1LineSnapWidth.  Subsequent autonomous line fetches will fetch
whole lines at a snap boundary and of size fetch1LineWidth.

Fetch1 will only initiate a memory fetch if it can reserve space in Fetch2
input buffer.  That input buffer serves an the fetch queue/LFL for the system.

Fetch1 contains two queues: requests and transfers to handle the stages of
translating the address of a line fetch (via the TLB) and accommodating the
request/response of fetches to/from memory.

Fetch requests from Fetch1 are pushed into the requests queue as newly
allocated FetchRequest objects once they have been sent to the ITLB with a
call to itb->translateTiming.

A response from the TLB moves the request from the requests queue to the
transfers queue.  If there is more than one entry in each queue, it is
possible to get a TLB response for request which is not at the head of the
requests queue.  In that case, the TLB response is marked up as a state change
to Translated in the request object, and advancing the request to transfers
(and the memory system) is left to calls to Fetch1::stepQueues which is called
in the cycle following any event is received.

Fetch1::tryToSendToTransfers is responsible for moving requests between the
two queues and issuing requests to memory.  Failed TLB lookups (prefetch
aborts) continue to occupy space in the queues until they are recovered at the
head of transfers.

Responses from memory change the request object state to Complete and
Fetch1::evaluate can pick up response data, package it in the ForwardLineData
object, and forward it to Fetch2%'s input buffer.

As space is always reserved in Fetch2::inputBuffer, setting the input buffer's
size to 1 results in non-prefetching behaviour.

When a change of stream occurs, translated requests queue members and
completed transfers queue members can be unconditionally discarded to make way
for new transfers.

\subsection fetch2 Fetch2 stage

Fetch2 receives a line from Fetch1 into its input buffer.  The data in the
head line in that buffer is iterated over and separated into individual
instructions which are packed into a vector of instructions which can be
passed to Decode.  Packing instructions can be aborted early if a fault is
found in either the input line as a whole or a decomposed instruction.

\subsubsection bp Branch prediction

Fetch2 contains the branch prediction mechanism.  This is a wrapper around the
branch predictor interface provided by gem5 (cpu/pred/...).

Branches are predicted for any control instructions found.  If prediction is
attempted for an instruction, the MinorDynInst::triedToPredict flag is set on
that instruction.

When a branch is predicted to take, the MinorDynInst::predictedTaken flag is
set and MinorDynInst::predictedTarget is set to the predicted target PC value.
The predicted branch instruction is then packed into Fetch2%'s output vector,
the prediction sequence number is incremented, and the branch is communicated
to Fetch1.

After signalling a prediction, Fetch2 will discard its input buffer contents
and will reject any new lines which have the same stream sequence number as
that branch but have a different prediction sequence number.  This allows
following sequentially fetched lines to be rejected without ignoring new lines
generated by a change of stream indicated from a 'real' branch from Execute
(which will have a new stream sequence number).

The program counter value provided to Fetch2 by Fetch1 packets is only updated
when there is a change of stream.  Fetch2::havePC indicates whether the PC
will be picked up from the next processed input line.  Fetch2::havePC is
necessary to allow line-wrapping instructions to be tracked through decode.

Branches (and instructions predicted to branch) which are processed by Execute
will generate BranchData (pipe_data.hh) data explaining the outcome of the
branch which is sent forwards to Fetch1 and Fetch2.  Fetch1 uses this data to
change stream (and update its stream sequence number and address for new
lines).  Fetch2 uses it to update the branch predictor.   Minor does not
communicate branch data to the branch predictor for instructions which are
discarded on the way to commit.

BranchData::BranchReason (pipe_data.hh) encodes the possible branch scenarios:

<table>
<tr>
    <td>Branch enum val.</td>
    <td>In Execute</td>
    <td>Fetch1 reaction</td>
    <td>Fetch2 reaction</td>
</tr>
<tr>
    <td>NoBranch</td>
    <td>(output bubble data)</td>
    <td>-</td>
    <td>-</td>
</tr>
<tr>
    <td>CorrectlyPredictedBranch</td>
    <td>Predicted, taken</td>
    <td>-</td>
    <td>Update BP as taken branch</td>
</tr>
<tr>
    <td>UnpredictedBranch</td>
    <td>Not predicted, taken and was taken</td>
    <td>New stream</td>
    <td>Update BP as taken branch</td>
</tr>
<tr>
    <td>BadlyPredictedBranch</td>
    <td>Predicted, not taken</td>
    <td>New stream to restore to old inst. source</td>
    <td>Update BP as not taken branch</td>
</tr>
<tr>
    <td>BadlyPredictedBranchTarget</td>
    <td>Predicted, taken, but to a different target than predicted one</td>
    <td>New stream</td>
    <td>Update BTB to new target</td>
</tr>
<tr>
    <td>SuspendThread</td>
    <td>Hint to suspend fetching</td>
    <td>Suspend fetch for this thread (branch to next inst. as wakeup
        fetch addr)</td>
    <td>-</td>
</tr>
<tr>
    <td>Interrupt</td>
    <td>Interrupt detected</td>
    <td>New stream</td>
    <td>-</td>
</tr>
</table>

The parameter decodeInputWidth sets the number of instructions which can be
packed into the output per cycle.  If the parameter fetch2CycleInput is true,
Decode can try to take instructions from more than one entry in its input
buffer per cycle.

\subsection decode Decode stage

Decode takes a vector of instructions from Fetch2 (via its input buffer) and
decomposes those instructions into micro-ops (if necessary) and packs them
into its output instruction vector.

The parameter executeInputWidth sets the number of instructions which can be
packed into the output per cycle.  If the parameter decodeCycleInput is true,
Decode can try to take instructions from more than one entry in its input
buffer per cycle.

\subsection execute Execute stage

Execute provides all the instruction execution and memory access mechanisms.
An instructions passage through Execute can take multiple cycles with its
precise timing modelled by a functional unit pipeline FIFO.

A vector of instructions (possibly including fault 'instructions') is provided
to Execute by Decode and can be queued in the Execute input buffer before
being issued.  Setting the parameter executeCycleInput allows execute to
examine more than one input buffer entry (more than one instruction vector).
The number of instructions in the input vector can be set with
executeInputWidth and the depth of the input buffer can be set with parameter
executeInputBufferSize.

\subsubsection fus Functional units

The Execute stage contains pipelines for each functional unit comprising the
computational core of the CPU.  Functional units are configured via the
executeFuncUnits parameter.  Each functional unit has a number of instruction
classes it supports, a stated delay between instruction issues, and a delay
from instruction issue to (possible) commit and an optional timing annotation
capable of more complicated timing.

Each active cycle, Execute::evaluate performs this action:

\verbatim
    Execute::evaluate:
        push input to inputBuffer
        setup references to input/output data slots and branch output slot

        step D-cache interface queues (similar to Fetch1)

        if interrupt posted:
            take interrupt (signalling branch to Fetch1/Fetch2)
        else
            commit instructions
            issue new instructions

        advance functional unit pipelines

        reactivate Execute if the unit is still active

        commit the push to the inputBuffer if that data hasn't all been used
\endverbatim

\subsubsection fifos Functional unit FIFOs

Functional units are implemented as SelfStallingPipelines (stage.hh).  These
are TimeBuffer FIFOs with two distinct 'push' and 'pop' wires.  They respond
to SelfStallingPipeline::advance in the same way as TimeBuffers <b>unless</b>
there is data at the far, 'pop', end of the FIFO.  A 'stalled' flag is
provided for signalling stalling and to allow a stall to be cleared.  The
intention is to provide a pipeline for each functional unit which will never
advance an instruction out of that pipeline until it has been processed and
the pipeline is explicitly unstalled.

The actions 'issue', 'commit', and 'advance' act on the functional units.

\subsubsection issue Issue

Issuing instructions involves iterating over both the input buffer
instructions and the heads of the functional units to try and issue
instructions in order.  The number of instructions which can be issued each
cycle is limited by the parameter executeIssueLimit, how executeCycleInput is
set, the availability of pipeline space and the policy used to choose a
pipeline in which the instruction can be issued.

At present, the only issue policy is strict round-robin visiting of each
pipeline with the given instructions in sequence.  For greater flexibility,
better (and more specific policies) will need to be possible.

Memory operation instructions traverse their functional units to perform their
EA calculations.  On 'commit', the ExecContext::initiateAcc execution phase is
performed and any memory access is issued (via. ExecContext::{read,write}Mem
calling LSQ::pushRequest) to the LSQ.

Note that faults are issued as if they are instructions and can (currently) be
issued to *any* functional unit.

Every issued instruction is also pushed into the Execute::inFlightInsts queue.
Memory ref. instructions are pushing into Execute::inFUMemInsts queue.

\subsubsection commit Commit

Instructions are committed by examining the head of the Execute::inFlightInsts
queue (which is decorated with the functional unit number to which the
instruction was issued).  Instructions which can then be found in their
functional units are executed and popped from Execute::inFlightInsts.

Memory operation instructions are committed into the memory queues (as
described above) and exit their functional unit pipeline but are not popped
from the Execute::inFlightInsts queue.  The Execute::inFUMemInsts queue
provides ordering to memory operations as they pass through the functional
units (maintaining issue order).  On entering the LSQ, instructions are popped
from Execute::inFUMemInsts.

If the parameter executeAllowEarlyMemoryIssue is set, memory operations can be
sent from their FU to the LSQ before reaching the head of
Execute::inFlightInsts but after their dependencies are met.
MinorDynInst::instToWaitFor is marked up with the latest dependent instruction
execSeqNum required to be committed for a memory operation to progress to the
LSQ.

Once a memory response is available (by testing the head of
Execute::inFlightInsts against LSQ::findResponse), commit will process that
response (ExecContext::completeAcc) and pop the instruction from
Execute::inFlightInsts.

Any branch, fault or interrupt will cause a stream sequence number change and
signal a branch to Fetch1/Fetch2.  Only instructions with the current stream
sequence number will be issued and/or committed.

\subsubsection advance Advance

All non-stalled pipeline are advanced and may, thereafter, become stalled.
Potential activity in the next cycle is signalled if there are any
instructions remaining in any pipeline.

\subsubsection sb Scoreboard

The scoreboard (Scoreboard) is used to control instruction issue.  It contains
a count of the number of in flight instructions which will write each general
purpose CPU integer or float register.  Instructions will only be issued where
the scoreboard contains a count of 0 instructions which will write to one of
the instructions source registers.

Once an instruction is issued, the scoreboard counts for each destination
register for an instruction will be incremented.

The estimated delivery time of the instruction's result is marked up in the
scoreboard by adding the length of the issued-to FU to the current time.  The
timings parameter on each FU provides a list of additional rules for
calculating the delivery time.  These are documented in the parameter comments
in MinorCPU.py.

On commit, (for memory operations, memory response commit) the scoreboard
counters for an instruction's source registers are decremented.  will be
decremented.

\subsubsection ifi Execute::inFlightInsts

The Execute::inFlightInsts queue will always contain all instructions in
flight in Execute in the correct issue order.  Execute::issue is the only
process which will push an instruction into the queue.  Execute::commit is the
only process that can pop an instruction.

\subsubsection lsq LSQ

The LSQ can support multiple outstanding transactions to memory in a number of
conservative cases.

There are three queues to contain requests: requests, transfers and the store
buffer.  The requests and transfers queue operate in a similar manner to the
queues in Fetch1.  The store buffer is used to decouple the delay of
completing store operations from following loads.

Requests are issued to the DTLB as their instructions leave their functional
unit.  At the head of requests, cacheable load requests can be sent to memory
and on to the transfers queue.  Cacheable stores will be passed to transfers
unprocessed and progress that queue maintaining order with other transactions.

The conditions in LSQ::tryToSendToTransfers dictate when requests can
be sent to memory.

All uncacheable transactions, split transactions and locked transactions are
processed in order at the head of requests.  Additionally, store results
residing in the store buffer can have their data forwarded to cacheable loads
(removing the need to perform a read from memory) but no cacheable load can be
issue to the transfers queue until that queue's stores have drained into the
store buffer.

At the end of transfers, requests which are LSQ::LSQRequest::Complete (are
faulting, are cacheable stores, or have been sent to memory and received a
response) can be picked off by Execute and either committed
(ExecContext::completeAcc) and, for stores, be sent to the store buffer.

Barrier instructions do not prevent cacheable loads from progressing to memory
but do cause a stream change which will discard that load.  Stores will not be
committed to the store buffer if they are in the shadow of the barrier but
before the new instruction stream has arrived at Execute.  As all other memory
transactions are delayed at the end of the requests queue until they are at
the head of Execute::inFlightInsts, they will be discarded by any barrier
stream change.

After commit, LSQ::BarrierDataRequest requests are inserted into the
store buffer to track each barrier until all preceding memory transactions
have drained from the store buffer.  No further memory transactions will be
issued from the ends of FUs until after the barrier has drained.

\subsubsection drain Draining

Draining is mostly handled by the Execute stage.  When initiated by calling
MinorCPU::drain, Pipeline::evaluate checks the draining status of each unit
each cycle and keeps the pipeline active until draining is complete.  It is
Pipeline that signals the completion of draining.  Execute is triggered by
MinorCPU::drain and starts stepping through its Execute::DrainState state
machine, starting from state Execute::NotDraining, in this order:

<table>
<tr>
    <td><b>State</b></td>
    <td><b>Meaning</b></td>
</tr>
<tr>
    <td>Execute::NotDraining</td>
    <td>Not trying to drain, normal execution</td>
</tr>
<tr>
    <td>Execute::DrainCurrentInst</td>
    <td>Draining micro-ops to complete inst.</td>
</tr>
<tr>
    <td>Execute::DrainHaltFetch</td>
    <td>Halt fetching instructions</td>
</tr>
<tr>
    <td>Execute::DrainAllInsts</td>
    <td>Discarding all instructions presented</td>
</tr>
</table>

When complete, a drained Execute unit will be in the Execute::DrainAllInsts
state where it will continue to discard instructions but has no knowledge of
the drained state of the rest of the model.

\section debug Debug options

The model provides a number of debug flags which can be passed to gem5 with
the --debug-flags option.

The available flags are:

<table>
<tr>
    <td><b>Debug flag</b></td>
    <td><b>Unit which will generate debugging output</b></td>
</tr>
<tr>
    <td>Activity</td>
    <td>Debug ActivityMonitor actions</td>
</tr>
<tr>
    <td>Branch</td>
    <td>Fetch2 and Execute branch prediction decisions</td>
</tr>
<tr>
    <td>MinorCPU</td>
    <td>CPU global actions such as wakeup/thread suspension</td>
</tr>
<tr>
    <td>Decode</td>
    <td>Decode</td>
</tr>
<tr>
    <td>MinorExec</td>
    <td>Execute behaviour</td>
</tr>
<tr>
    <td>Fetch</td>
    <td>Fetch1 and Fetch2</td>
</tr>
<tr>
    <td>MinorInterrupt</td>
    <td>Execute interrupt handling</td>
</tr>
<tr>
    <td>MinorMem</td>
    <td>Execute memory interactions</td>
</tr>
<tr>
    <td>MinorScoreboard</td>
    <td>Execute scoreboard activity</td>
</tr>
<tr>
    <td>MinorTrace</td>
    <td>Generate MinorTrace cyclic state trace output (see below)</td>
</tr>
<tr>
    <td>MinorTiming</td>
    <td>MinorTiming instruction timing modification operations</td>
</tr>
</table>

The group flag Minor enables all of the flags beginning with Minor.

\section trace MinorTrace and minorview.py

The debug flag MinorTrace causes cycle-by-cycle state data to be printed which
can then be processed and viewed by the minorview.py tool.  This output is
very verbose and so it is recommended it only be used for small examples.

\subsection traceformat MinorTrace format

There are three types of line outputted by MinorTrace:

\subsubsection state MinorTrace - Ticked unit cycle state

For example:

\verbatim
 110000: system.cpu.dcachePort: MinorTrace: state=MemoryRunning in_tlb_mem=0/0
\endverbatim

For each time step, the MinorTrace flag will cause one MinorTrace line to be
printed for every named element in the model.

\subsubsection traceunit MinorInst - summaries of instructions issued by \
    Decode

For example:

\verbatim
 140000: system.cpu.execute: MinorInst: id=0/1.1/1/1.1 addr=0x5c \
                             inst="  mov r0, #0" class=IntAlu
\endverbatim

MinorInst lines are currently only generated for instructions which are
committed.

\subsubsection tracefetch1 MinorLine - summaries of line fetches issued by \
    Fetch1

For example:

\verbatim
  92000: system.cpu.icachePort: MinorLine: id=0/1.1/1 size=36 \
                                vaddr=0x5c paddr=0x5c
\endverbatim

\subsection minorview minorview.py

Minorview (util/minorview.py) can be used to visualise the data created by
MinorTrace.

\verbatim
usage: minorview.py [-h] [--picture picture-file] [--prefix name]
                   [--start-time time] [--end-time time] [--mini-views]
                   event-file

Minor visualiser

positional arguments:
  event-file

optional arguments:
  -h, --help            show this help message and exit
  --picture picture-file
                        markup file containing blob information (default:
                        <minorview-path>/minor.pic)
  --prefix name         name prefix in trace for CPU to be visualised
                        (default: system.cpu)
  --start-time time     time of first event to load from file
  --end-time time       time of last event to load from file
  --mini-views          show tiny views of the next 10 time steps
\endverbatim

Raw debugging output can be passed to minorview.py as the event-file. It will
pick out the MinorTrace lines and use other lines where units in the
simulation are named (such as system.cpu.dcachePort in the above example) will
appear as 'comments' when units are clicked on the visualiser.

Clicking on a unit which contains instructions or lines will bring up a speech
bubble giving extra information derived from the MinorInst/MinorLine lines.

--start-time and --end-time allow only sections of debug files to be loaded.

--prefix allows the name prefix of the CPU to be inspected to be supplied.
This defaults to 'system.cpu'.

In the visualiser, The buttons Start, End, Back, Forward, Play and Stop can be
used to control the displayed simulation time.

The diagonally striped coloured blocks are showing the InstId of the
instruction or line they represent.  Note that lines in Fetch1 and f1ToF2.F
only show the id fields of a line and that instructions in Fetch2, f2ToD, and
decode.inputBuffer do not yet have execute sequence numbers.  The T/S.P/L/F.E
buttons can be used to toggle parts of InstId on and off to make it easier to
understand the display.  Useful combinations are:

<table>
<tr>
    <td><b>Combination</b></td>
    <td><b>Reason</b></td>
</tr>
<tr>
    <td>E</td>
    <td>just show the final execute sequence number</td>
</tr>
<tr>
    <td>F/E</td>
    <td>show the instruction-related numbers</td>
</tr>
<tr>
    <td>S/P</td>
    <td>show just the stream-related numbers (watch the stream sequence
        change with branches and not change with predicted branches)</td>
</tr>
<tr>
    <td>S/E</td>
    <td>show instructions and their stream</td>
</tr>
</table>

The key to the right shows all the displayable colours (some of the colour
choices are quite bad!):

<table>
<tr>
    <td><b>Symbol</b></td>
    <td><b>Meaning</b></td>
</tr>
<tr>
    <td>U</td>
    <td>Unknown data</td>
</tr>
<tr>
    <td>B</td>
    <td>Blocked stage</td>
</tr>
<tr>
    <td>-</td>
    <td>Bubble</td>
</tr>
<tr>
    <td>E</td>
    <td>Empty queue slot</td>
</tr>
<tr>
    <td>R</td>
    <td>Reserved queue slot</td>
</tr>
<tr>
    <td>F</td>
    <td>Fault</td>
</tr>
<tr>
    <td>r</td>
    <td>Read (used as the leftmost stripe on data in the dcachePort)</td>
</tr>
<tr>
    <td>w</td>
    <td>Write " "</td>
</tr>
<tr>
    <td>0 to 9</td>
    <td>last decimal digit of the corresponding data</td>
</tr>
</table>

\verbatim

    ,---------------.         .--------------.  *U
    | |=|->|=|->|=| |         ||=|||->||->|| |  *-  <- Fetch queues/LSQ
    `---------------'         `--------------'  *R
    === ======                                  *w  <- Activity/Stage activity
                              ,--------------.  *1
    ,--.      ,.      ,.      | ============ |  *3  <- Scoreboard
    |  |-\[]-\||-\[]-\||-\[]-\| ============ |  *5  <- Execute::inFlightInsts
    |  | :[] :||-/[]-/||-/[]-/| -. --------  |  *7
    |  |-/[]-/||  ^   ||      |  | --------- |  *9
    |  |      ||  |   ||      |  | ------    |
[]->|  |    ->||  |   ||      |  | ----      |
    |  |<-[]<-||<-+-<-||<-[]<-|  | ------    |->[] <- Execute to Fetch1,
    '--`      `'  ^   `'      | -' ------    |        Fetch2 branch data
             ---. |  ---.     `--------------'
             ---' |  ---'       ^       ^
                  |   ^         |       `------------ Execute
  MinorBuffer ----' input       `-------------------- Execute input buffer
                    buffer
\endverbatim

Stages show the colours of the instructions currently being
generated/processed.

Forward FIFOs between stages show the data being pushed into them at the
current tick (to the left), the data in transit, and the data available at
their outputs (to the right).

The backwards FIFO between Fetch2 and Fetch1 shows branch prediction data.

In general, all displayed data is correct at the end of a cycle's activity at
the time indicated but before the inter-stage FIFOs are ticked.  Each FIFO
has, therefore an extra slot to show the asserted new input data, and all the
data currently within the FIFO.

Input buffers for each stage are shown below the corresponding stage and show
the contents of those buffers as horizontal strips.  Strips marked as reserved
(cyan by default) are reserved to be filled by the previous stage.  An input
buffer with all reserved or occupied slots will, therefore, block the previous
stage from generating output.

Fetch queues and LSQ show the lines/instructions in the queues of each
interface and show the number of lines/instructions in TLB and memory in the
two striped colours of the top of their frames.

Inside Execute, the horizontal bars represent the individual FU pipelines.
The vertical bar to the left is the input buffer and the bar to the right, the
instructions committed this cycle.  The background of Execute shows
instructions which are being committed this cycle in their original FU
pipeline positions.

The strip at the top of the Execute block shows the current streamSeqNum that
Execute is committing.  A similar stripe at the top of Fetch1 shows that
stage's expected streamSeqNum and the stripe at the top of Fetch2 shows its
issuing predictionSeqNum.

The scoreboard shows the number of instructions in flight which will commit a
result to the register in the position shown.  The scoreboard contains slots
for each integer and floating point register.

The Execute::inFlightInsts queue shows all the instructions in flight in
Execute with the oldest instruction (the next instruction to be committed) to
the right.

'Stage activity' shows the signalled activity (as E/1) for each stage (with
CPU miscellaneous activity to the left)

'Activity' show a count of stage and pipe activity.

\subsection picformat minor.pic format

The minor.pic file (src/minor/minor.pic) describes the layout of the
models blocks on the visualiser.  Its format is described in the supplied
minor.pic file.

*/

}