Host-Fifo/volta/gv100/dev_ram.ref.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269

Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------

2  -  GPU INSTANCE RAM (RAMIN)
==============================

     A GPU contains a block called "XVE" that manages the interface with PCI, a
block called "Host" that fetches graphics instructions, blocks called "engines"
that execute graphics instructions, and blocks that manage the interface with
memory.

               .-----.                    .------.
               |     |<------------------>|      |
               |     |                    |      |
               |     |     .---------.    |      |
               |     |<--->| Engine1 |<---|      |
               |     |     `---------'    |      |
.---------.    |     |                    |      |
|   GPU   |    |     |     .---------.    | Host |
|  Local  |<-->|  FB |<--->| Engine2 |<---|      |
| Memory  |    | MMU |     `---------'    |      |
`---------'    | Hub |         ...        |      |   .--------.
               |     |     .---------.    |      |   | System |
               |     |<--->| EngineN |<---|      |   | Memory |
               |     |     `---------'    `------'   `--------'
               |     |                       ^           ^
               |     |                       |           |
.---------.    |     |                    .--V--. PCI .--V--.     .-----.
| Display |<-->|     |<------------------>| XVE |<--->| NB  |<--->| CPU |
`---------'    `-----'                    `-----'     `-----'     `-----'

     A GPU context is a virtualization of the GPU for a particular software
application.  A GPU instance block is a block of memory that contains the state
for a GPU context.  A GPU context's instance block consists of Host state,
pointers to each engine's state, and memory management state.  A GPU instance
block also contains a pointer to a block of memory that contains that part of a
GPU context's state that a user-level driver may access.  A GPU instance block
fits within a single 4K-byte page of memory.

       Run List             Channel-Map RAM
     .----------.  Ch Id   .----------------.
     | RL Entry0 |----.    |Ch0 Inst Blk Ptr|
     | RL Entry1 |    |    |Ch1 Inst Blk Ptr|
     | RL Entry2 |    |    |       ...      |
     |    ...    |    `--->|ChI Inst Blk Ptr|----.
     | RL EntryN |         |       ...      |    |
     `-----------'         |ChN Inst Blk Ptr|    |
                           `----------------'    |
                                                 |
 .-----------------------------------------------'
 |
 |    GPU Instance Block                                 GPFIFO
 `-->.-----------------.                        GP_GET .--------.     PB Seg
     |                 |------------------------------>|GP Entry|    .--------.
     |   Host State    |                               |GP Entry|--->|PB Entry|
     |     (RAMFC)     |          User-Driver State    |        |    |PB Entry|
     |                 |              .-------.        |GP Entry|    |   ...  |
     |                 |------------->|(USERD)| GP_PUT |GP Entry|    |PB Entry|
     |                 |              |       |------->`--------'    `--------'
     |                 |              |       |
     +-----------------+              |       |
     |     Memory      |              `-------'
     |   Management    |----------.  Page Directory    Page Table
     |     State       |          |   .-------.        .-------.
     +-----------------+          `-->|  PDE  |        |  PTE  |
     |   Pointer to    |              |  PDE  |------->|  PTE  |
     |     Engine0     |--------.     |  ...  |        |  ...  |
     |      State      |        |     |  PDE  |        |  PTE  |
     +-----------------+        |     `-------'        `-------'
     |   Pointer to    |        |
     |     Engine1     |-----.  |   Engine0 State
     |      State      |     |  |     .-------.
     +-----------------+     |  `---->|       |
            ...              |        `-------'
     +-----------------+     |
     |   Pointer to    |     |      Engine1 State
     |     EngineN     |--.  |        .-------.
     |      State      |  |  `------->|       |
     `-----------------'  |           `-------'
                          |               ...
                          |
                          |         EngineN State
                          |           .-------.
                          `---------->|       |
                                      `-------'

     The GPU context's Host state occupies the first 128 double words of an
instance block.  A GPU context's Host state is called "RAMFC". Please see
the NV_RAMFC section below for a description of Host state.

     The GPU context's memory-management state defines the virtual address space
that the GPU context uses.  Memory management state consists of page and
directory tables (that specify the mapping between virtual addresses and
physical addresses, and the attributes of memory pages), and the limit of the
virtual address space.  The NV_RAMIN_PAGE_DIR_BASE entry contains the address of
base of the GPU context's page directory table (PDB).  NV_RAMIN_PAGE_DIR_BASE is
4K-byte aligned.

     The NV_RAMIN_ENG*_WFI_PTR entry contains the address of a block of memory
for storing an engine's context state.  Blocks of memory that contain engine state
are 4K-byte aligned.  Only one engine context is supported per instance block.

     The NV_RAMIN_ENG*_CS field is deprecated, it was used to indicate whether
GPU state should be restored from the FGCS pointer or from the WFI CS pointer.
Engines only need/support one CTXSW pointer and all state is stored there
whether a WFI CS or other form of preemption was performed.  This field must
always be set to WFI for legacy reasons, and will eventually be deleted.


#define NV_RAMIN                                                    /* ----G */

// The instance block must be 4k-aligned.
#define NV_RAMIN_BASE_SHIFT                                      12 /*       */

// The instance block size fits within a single 4k block.
#define NV_RAMIN_ALLOC_SIZE                                    4096 /*       */

// Host State
#define NV_RAMIN_RAMFC                         (127*32+31):(0*32+0) /* RWXUF */

// Memory-Management State

    The following fields are used for non-VEID engines.  The NV_RAMIN_SC_* described later
    are used for VEID engines.

    NV_RAMIN_PAGE_DIR_BASE_TARGET determines if the top level of the page tables
    is in video memory or system memory (peer is not allowed), and the CPU cache
    coherency for system memory.
    Using INVALID, unbinds the selected engine.

#define NV_RAMIN_PAGE_DIR_BASE_TARGET               (128*32+1):(128*32+0) /* RWXUF */
#define NV_RAMIN_PAGE_DIR_BASE_TARGET_VID_MEM                  0x00000000 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_TARGET_INVALID                  0x00000001 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT         0x00000002 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT      0x00000003 /* RW--V */

    NV_RAMIN_PAGE_DIR_BASE_VOL identifies the volatile behavior
    of top level of the page table (whether local L2 can cache it or not).

#define NV_RAMIN_PAGE_DIR_BASE_VOL                  (128*32+2):(128*32+2) /* RWXUF */
#define NV_RAMIN_PAGE_DIR_BASE_VOL_TRUE                        0x00000001 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_VOL_FALSE                       0x00000000 /* RW--V */


    These bits specify whether the MMU will treats faults as replayable or not.
    The engine will send these bits to the MMU as part of the instance bind.

#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX     (128*32+4):(128*32+4) /* RWXUF */
#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED       0x00000000 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED        0x00000001 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC     (128*32+5):(128*32+5) /* RWXUF */
#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED       0x00000000 /* RW--V */
#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED        0x00000001 /* RW--V */

    NV_RAMIN_USE_NEW_PT_FORMAT determines which page table format to use.
    When NV_RAMIN_USE_NEW_PT_FORMAT is false, the page table uses the old format.
    When NV_RAMIN_USE_NEW_PT_FORMAT is true, the page table uses the new format.

    Volta only supports the new format.  Selecting the old format results in an UNBOUND_INSTANCE fault.


#define NV_RAMIN_USE_VER2_PT_FORMAT             (128*32+10):(128*32+10) /*       */
#define NV_RAMIN_USE_VER2_PT_FORMAT_FALSE 0x00000000 /*       */
#define NV_RAMIN_USE_VER2_PT_FORMAT_TRUE   0x00000001 /*       */

    When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit TRUE, the bit selects the big page size.
    When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit FALSE, NV_PFB_PRI_MMU_CTRL_VM_PG_SIZE selects the big page size.

    Volta only supports 64KB for big pages.  Selecting 128KB for big pages results in an UNBOUND_INSTANCE fault.

#define NV_RAMIN_BIG_PAGE_SIZE                    (128*32+11):(128*32+11) /* RWXUF */
#define NV_RAMIN_BIG_PAGE_SIZE_128KB                           0x00000000 /* RW--V */
#define NV_RAMIN_BIG_PAGE_SIZE_64KB                            0x00000001 /* RW--V */

    NV_RAMIN_PAGE_DIR_BASE_LO and NV_RAMIN_PAGE_DIR_BASE_HI
    identify the page directory base (start of the page table)
    location for this context.

#define NV_RAMIN_PAGE_DIR_BASE_LO                 (128*32+31):(128*32+12) /* RWXUF */
#define NV_RAMIN_PAGE_DIR_BASE_HI                  (129*32+31):(129*32+0) /* RWXUF */

// Single engine pointer channels cannot support multiple
// engines with CTXSW pointers
#define NV_RAMIN_ENGINE_CS                          (132*32+3):(132*32+3) /*       */
#define NV_RAMIN_ENGINE_CS_WFI                                 0x00000000 /*       */
#define NV_RAMIN_ENGINE_CS_FG                                  0x00000001 /*       */
#define NV_RAMIN_ENGINE_WFI_TARGET                  (132*32+1):(132*32+0) /*       */
#define NV_RAMIN_ENGINE_WFI_TARGET_LOCAL_MEM                   0x00000000 /*       */
#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_COHERENT            0x00000002 /*       */
#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_NONCOHERENT         0x00000003 /*       */
#define NV_RAMIN_ENGINE_WFI_MODE                    (132*32+2):(132*32+2) /*       */
#define NV_RAMIN_ENGINE_WFI_MODE_PHYSICAL                      0x00000000 /*       */
#define NV_RAMIN_ENGINE_WFI_MODE_VIRTUAL                       0x00000001 /*       */
#define NV_RAMIN_ENGINE_WFI_PTR_LO                (132*32+31):(132*32+12) /*       */
#define NV_RAMIN_ENGINE_WFI_PTR_HI                  (133*32+7):(133*32+0) /*       */

#define NV_RAMIN_ENGINE_WFI_VEID             (134*32+(6-1)):(134*32+0) /*       */
#define NV_RAMIN_ENABLE_ATS                        (135*32+31):(135*32+31) /* RWXUF */
#define NV_RAMIN_ENABLE_ATS_TRUE                                0x00000001 /* RW--V */
#define NV_RAMIN_ENABLE_ATS_FALSE                               0x00000000 /* RW--V */
#define NV_RAMIN_PASID                 (135*32+(20-1)):(135*32+0) /* RWXUF */


     Pointer to a method buffer in BAR2 memory where a faulted engine can save
out methods. BAR2 accesses are assumed to be virtual, so the address saved here
is a virtual address.

#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_LO                   (136*32+31):(136*32+0)  /* RWXUF */
#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_HI                   (137*32+(((49-1)-32))):(137*32+0)  /* RWXUF */


    These entries are used to inform FECS which of the below array of PDBs are
    valid/filled in and need to subsequently be bound.

    This needs to reserve at least NV_LITTER_NUM_SUBCTX entries.  Currently
    there is enough space reserved for 64 subcontexts.
#define NV_RAMIN_SC_PDB_VALID(i)             (166*32+i):(166*32+i) /* RWXUF */
#define NV_RAMIN_SC_PDB_VALID__SIZE_1         64 /*       */
#define NV_RAMIN_SC_PDB_VALID_FALSE                     0x00000000 /* RW--V */
#define NV_RAMIN_SC_PDB_VALID_TRUE                      0x00000001 /* RW--V */

// Memory-Management VEID array

    The NV_RAMIN_SC_PAGE_DIR_BASE_* entries are an array of page table settings
    for each subcontext. When a context supports subcontexts, the page table
    information for a given VEID/Subcontext needs to be filled in or else page
    faults will result on access.

    These properties for the page table must be filled in for all channels
    sharing the same context as any channel's NV_RAMIN may be used to load the
    context.

    The non-subcontext page table information such as NV_RAMIN_PAGE_DIR_BASE*
    are used by non-subcontext engines and clients such as Host, CE, or the
    video engines.

    NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) determines if the top level of the page tables
    is in video memory or system memory (peer is not allowed), and the CPU cache
    coherency for system memory.
    Using INVALID, unbinds the selected subcontext.

#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i)             ((168+(i)*4)*32+1):((168+(i)*4)*32+0) /* RWXUF */
#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET__SIZE_1                         64 /*       */
#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_VID_MEM                  0x00000000 /* RW--V */
#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_INVALID                  0x00000001 /* RW--V */ // Note: INVALID should match PEER
#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT         0x00000002 /* RW--V */
#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT      0x00000003 /* RW--V */

    NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) identifies the volatile behavior
    of the top level of the page table (whether local L2 can cache it or not).

#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i)                  ((168+(i)*4)*32+2):((168+(i)*4)*32+2) /* RWXUF */
#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL__SIZE_1                         64 /*       */
#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_TRUE                        0x00000001 /* RW--V */
#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_FALSE                       0x00000000 /* RW--V */

    NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) and
    NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) bits specify whether
    the MMU will treats faults from TEX and GCC as replayable or
    not. Based on that fault packets are written into replayable fault
    buffer (or not) and faulting requests are put into replay request
    buffer (or not).
    The last bind that does not unbind a sub-context determines the REPLAY_TEX and REPLAY_GCC for all sub-contexts.

#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i)     ((168+(i)*4)*32+4):((168+(i)*4)*32+4) /* RWXUF */
#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX__SIZE_1                         64 /*       */
#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED       0x00000000 /* RW--V */
#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED        0x00000001 /* RW--V */

#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i)     ((168+(i)*4)*32+5):((168+(i)*4)*32+5) /* RWXUF */
#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC__SIZE_1                         64 /*       */
#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED       0x00000000 /* RW--V */
#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED        0x00000001 /* RW--V */

    NV_RAMIN_SC_USE_VER2_PT_FORMAT determines which page table format to use.
    When NV_RAMIN_SC_USE_VER2_PT_FORMAT is false, the page table uses
    the old format(2-level page table). When
    NV_RAMIN_SC_USE_VER2_PT_FORMAT is true, the page table uses the
    new format (5-level 49-bit VA format).
    The last bind that does not unbind a sub-context determines the page table format for all sub-contexts.
    Volta only supports the new format.  Selecting the old format results in an UNBOUND_INSTANCE fault.

#define NV_RAMIN_SC_USE_VER2_PT_FORMAT(i)          ((168+(i)*4)*32+10):((168+(i)*4)*32+10) /* RWXUF */
#define NV_RAMIN_SC_USE_VER2_PT_FORMAT__SIZE_1                   64 /*       */
#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_FALSE                       0x00000000 /* RW--V */
#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_TRUE                        0x00000001 /* RW--V */

    The last bind that does not unbind a sub-context determines the big page size for all sub-contexts.
    Volta only supports 64KB for big pages.

#define NV_RAMIN_SC_BIG_PAGE_SIZE(i)                    ((168+(i)*4)*32+11):((168+(i)*4)*32+11) /* RWXUF */
#define NV_RAMIN_SC_BIG_PAGE_SIZE__SIZE_1                   64 /*       */
#define NV_RAMIN_SC_BIG_PAGE_SIZE_64KB                            0x00000001 /* RW--V */

    NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) and NV_RAMIN_SC_PAGE_DIR_BASE_HI(i)
    identify the page directory base (start of the page table)
    location for subcontext i.

#define NV_RAMIN_SC_PAGE_DIR_BASE_LO(i)                ((168+(i)*4)*32+31):((168+(i)*4)*32+12) /* RWXUF */
#define NV_RAMIN_SC_PAGE_DIR_BASE_LO__SIZE_1                   64 /*       */
#define NV_RAMIN_SC_PAGE_DIR_BASE_HI(i)                 ((169+(i)*4)*32+31):((169+(i)*4)*32+0) /* RWXUF */
#define NV_RAMIN_SC_PAGE_DIR_BASE_HI__SIZE_1                   64 /*       */


    NV_RAMIN_SC_ENABLE_ATS(i) tells whether subcontext i is ATS
    enabled or not. In case, set to TRUE, GMMU will look for VA->PA
    translations into both GMMU and ATS page tables.
    ATS can be enabled or disabled per subcontext.

#define NV_RAMIN_SC_ENABLE_ATS(i)                       ((170+(i)*4)*32+31):((170+(i)*4)*32+31) /* RWXUF */

    NV_RAMIN_SC_PASID(i) identifies the PASID (process address space
    ID) in CPU for subcontext i. PASID is used to get ATS
    translation when ATS page table lookup is needed. During ATS TLB
    shootdown, PASID is also used to match against the one coming with
    shootdown request.

#define NV_RAMIN_SC_PASID(i)                       ((170+(i)*4)*32+(20-1)):((170+(i)*4)*32+0) /* RWXUF */


3  -  FIFO CONTEXT RAM (RAMFC)
==============================


     The NV_RAMFC part of a GPU-instance block contains Host's part of a virtual
GPU's state.  Host is referred to as "FIFO". "FC" stands for FIFO Context.
When Host switches from serving one GPU context to serving a second, Host saves
state for the first GPU context to the first GPU context's RAMFC area, and loads
state for the second GPU context from the second GPU context's RAMFC area.

     RAMFC is located at NV_RAMIN_RAMFC within the GPU instance block.  In
Kepler, this is at the start of the block.  RAMFC is 4KB aligned.

     Every Host word entry in RAMFC directly corresponds to a PRI-accessible
register.  For a description of the contents of a RAMFC entry, please see the
description of the corresponding register in "manuals/dev_pbdma.ref".  The
offsets of the fields within each entry in RAMFC match those of the
corresponding register in the associated PBDMA unit's PRI space.


    RAMFC Entry                     PBDMA Register
    ------------------------------- ----------------------------------
    NV_RAMFC_SIGNATURE               NV_PPBDMA_SIGNATURE(i)
    NV_RAMFC_GP_BASE                 NV_PPBDMA_GP_BASE(i)
    NV_RAMFC_GP_BASE_HI              NV_PPBDMA_GP_BASE_HI(i)
    NV_RAMFC_GP_FETCH                NV_PPBDMA_GP_FETCH(i)
    NV_RAMFC_GP_GET                  NV_PPBDMA_GP_GET(i)
    NV_RAMFC_GP_PUT                  NV_PPBDMA_GP_PUT(i)
    NV_RAMFC_PB_FETCH                NV_PPBDMA_PB_FETCH(i)
    NV_RAMFC_PB_FETCH_HI             NV_PPBDMA_PB_FETCH_HI(i)
    NV_RAMFC_PB_GET                  NV_PPBDMA_GET(i)
    NV_RAMFC_PB_GET_HI               NV_PPBDMA_GET_HI(i)
    NV_RAMFC_PB_PUT                  NV_PPBDMA_PUT(i)
    NV_RAMFC_PB_PUT_HI               NV_PPBDMA_PUT_HI(i)
    NV_RAMFC_PB_TOP_LEVEL_GET        NV_PPBDMA_TOP_LEVEL_GET(i)
    NV_RAMFC_PB_TOP_LEVEL_GET_HI     NV_PPBDMA_TOP_LEVEL_GET_HI(i)
    NV_RAMFC_GP_CRC                  NV_PPBDMA_GP_CRC(i)
    NV_RAMFC_PB_HEADER               NV_PPBDMA_PB_HEADER(i)
    NV_RAMFC_PB_COUNT                NV_PPBDMA_PB_COUNT(i)
    NV_RAMFC_PB_CRC                  NV_PPBDMA_PB_CRC(i)
    NV_RAMFC_SUBDEVICE               NV_PPBDMA_SUBDEVICE(i)
    NV_RAMFC_METHOD0                 NV_PPBDMA_METHOD0(i)
    NV_RAMFC_METHOD1                 NV_PPBDMA_METHOD1(i)
    NV_RAMFC_METHOD2                 NV_PPBDMA_METHOD2(i)
    NV_RAMFC_METHOD3                 NV_PPBDMA_METHOD3(i)
    NV_RAMFC_DATA0                   NV_PPBDMA_DATA0(i)
    NV_RAMFC_DATA1                   NV_PPBDMA_DATA1(i)
    NV_RAMFC_DATA2                   NV_PPBDMA_DATA2(i)
    NV_RAMFC_DATA3                   NV_PPBDMA_DATA3(i)
    NV_RAMFC_TARGET                  NV_PPBDMA_TARGET(i)
    NV_RAMFC_METHOD_CRC              NV_PPBDMA_METHOD_CRC(i)
    NV_RAMFC_REF                     NV_PPBDMA_REF(i)
    NV_RAMFC_RUNTIME                 NV_PPBDMA_RUNTIME(i)
    NV_RAMFC_SEM_ADDR_LO             NV_PPBDMA_SEM_ADDR_LO(i)
    NV_RAMFC_SEM_ADDR_HI             NV_PPBDMA_SEM_ADDR_HI(i)
    NV_RAMFC_SEM_PAYLOAD_LO          NV_PPBDMA_SEM_PAYLOAD_LO(i)
    NV_RAMFC_SEM_PAYLOAD_HI          NV_PPBDMA_SEM_PAYLOAD_HI(i)
    NV_RAMFC_SEM_EXECUTE             NV_PPBDMA_SEM_EXECUTE(i)
    NV_RAMFC_ACQUIRE_DEADLINE        NV_PPBDMA_ACQUIRE_DEADLINE(i)
    NV_RAMFC_ACQUIRE                 NV_PPBDMA_ACQUIRE(i)
    NV_RAMFC_MEM_OP_A                NV_PPBDMA_MEM_OP_A(i)
    NV_RAMFC_MEM_OP_B                NV_PPBDMA_MEM_OP_B(i)
    NV_RAMFC_MEM_OP_C                NV_PPBDMA_MEM_OP_C(i)
    NV_RAMFC_USERD                   NV_PPBDMA_USERD(i)
    NV_RAMFC_USERD_HI                NV_PPBDMA_USERD_HI(i)
    NV_RAMFC_HCE_CTRL                NV_PPBDMA_HCE_CTRL(i)
    NV_RAMFC_CONFIG                  NV_PPBDMA_CONFIG(i)
    NV_RAMFC_SET_CHANNEL_INFO        NV_PPBDMA_SET_CHANNEL_INFO(i)
    ------------------------------- ----------------------------------

#define NV_RAMFC                                                    /* ----G */
#define NV_RAMFC_GP_PUT                          (0*32+31):(0*32+0) /* RWXUF */
#define NV_RAMFC_MEM_OP_A                        (1*32+31):(1*32+0) /* RWXUF */
#define NV_RAMFC_USERD                           (2*32+31):(2*32+0) /* RWXUF */
#define NV_RAMFC_USERD_HI                        (3*32+31):(3*32+0) /* RWXUF */
#define NV_RAMFC_SIGNATURE                       (4*32+31):(4*32+0) /* RWXUF */
#define NV_RAMFC_GP_GET                          (5*32+31):(5*32+0) /* RWXUF */
#define NV_RAMFC_PB_GET                          (6*32+31):(6*32+0) /* RWXUF */
#define NV_RAMFC_PB_GET_HI                       (7*32+31):(7*32+0) /* RWXUF */
#define NV_RAMFC_PB_TOP_LEVEL_GET                (8*32+31):(8*32+0) /* RWXUF */
#define NV_RAMFC_PB_TOP_LEVEL_GET_HI             (9*32+31):(9*32+0) /* RWXUF */
#define NV_RAMFC_REF                           (10*32+31):(10*32+0) /* RWXUF */
#define NV_RAMFC_RUNTIME                       (11*32+31):(11*32+0) /* RWXUF */
#define NV_RAMFC_ACQUIRE                       (12*32+31):(12*32+0) /* RWXUF */
#define NV_RAMFC_ACQUIRE_DEADLINE              (13*32+31):(13*32+0) /* RWXUF */
#define NV_RAMFC_SEM_ADDR_HI                   (14*32+31):(14*32+0) /* RWXUF */
#define NV_RAMFC_SEM_ADDR_LO                   (15*32+31):(15*32+0) /* RWXUF */
#define NV_RAMFC_SEM_PAYLOAD_LO                (16*32+31):(16*32+0) /* RWXUF */
#define NV_RAMFC_SEM_EXECUTE                   (17*32+31):(17*32+0) /* RWXUF */
#define NV_RAMFC_GP_BASE                       (18*32+31):(18*32+0) /* RWXUF */
#define NV_RAMFC_GP_BASE_HI                    (19*32+31):(19*32+0) /* RWXUF */
#define NV_RAMFC_GP_FETCH                      (20*32+31):(20*32+0) /* RWXUF */
#define NV_RAMFC_PB_FETCH                      (21*32+31):(21*32+0) /* RWXUF */
#define NV_RAMFC_PB_FETCH_HI                   (22*32+31):(22*32+0) /* RWXUF */
#define NV_RAMFC_PB_PUT                        (23*32+31):(23*32+0) /* RWXUF */
#define NV_RAMFC_PB_PUT_HI                     (24*32+31):(24*32+0) /* RWXUF */
#define NV_RAMFC_MEM_OP_B                      (25*32+31):(25*32+0) /* RWXUF */
#define NV_RAMFC_RESERVED26                    (26*32+31):(26*32+0) /* RWXUF */
#define NV_RAMFC_RESERVED27                    (27*32+31):(27*32+0) /* RWXUF */
#define NV_RAMFC_RESERVED28                    (28*32+31):(28*32+0) /* RWXUF */
#define NV_RAMFC_GP_CRC                        (29*32+31):(29*32+0) /* RWXUF */
#define NV_RAMFC_PB_HEADER                     (33*32+31):(33*32+0) /* RWXUF */
#define NV_RAMFC_PB_COUNT                      (34*32+31):(34*32+0) /* RWXUF */
#define NV_RAMFC_SUBDEVICE                     (37*32+31):(37*32+0) /* RWXUF */
#define NV_RAMFC_PB_CRC                        (38*32+31):(38*32+0) /* RWXUF */
#define NV_RAMFC_SEM_PAYLOAD_HI                (39*32+31):(39*32+0) /* RWXUF */
#define NV_RAMFC_MEM_OP_C                      (40*32+31):(40*32+0) /* RWXUF */
#define NV_RAMFC_RESERVED20                    (41*32+31):(41*32+0) /* RWXUF */
#define NV_RAMFC_RESERVED21                    (42*32+31):(42*32+0) /* RWXUF */
#define NV_RAMFC_TARGET                        (43*32+31):(43*32+0) /* RWXUF */
#define NV_RAMFC_METHOD_CRC                    (44*32+31):(44*32+0) /* RWXUF */
#define NV_RAMFC_METHOD0                       (48*32+31):(48*32+0) /* RWXUF */
#define NV_RAMFC_DATA0                         (49*32+31):(49*32+0) /* RWXUF */
#define NV_RAMFC_METHOD1                       (50*32+31):(50*32+0) /* RWXUF */
#define NV_RAMFC_DATA1                         (51*32+31):(51*32+0) /* RWXUF */
#define NV_RAMFC_METHOD2                       (52*32+31):(52*32+0) /* RWXUF */
#define NV_RAMFC_DATA2                         (53*32+31):(53*32+0) /* RWXUF */
#define NV_RAMFC_METHOD3                       (54*32+31):(54*32+0) /* RWXUF */
#define NV_RAMFC_DATA3                         (55*32+31):(55*32+0) /* RWXUF */
#define NV_RAMFC_HCE_CTRL                      (57*32+31):(57*32+0) /* RWXUF */
#define NV_RAMFC_CONFIG                        (61*32+31):(61*32+0) /* RWXUF */
#define NV_RAMFC_SET_CHANNEL_INFO              (63*32+31):(63*32+0) /* RWXUF */

#define NV_RAMFC_BASE_SHIFT                                      12 /*       */

    Size of the full range of RAMFC in bytes.
#define NV_RAMFC_SIZE_VAL                                0x00000200 /* ----C */

4 - USER-DRIVER ACCESSIBLE RAM (RAMUSERD)
=========================================

     A user-level driver is allowed to access only a small portion of a GPU
context's state.  The portion of a GPU context's state that a user-level driver
can access is stored in a block of memory called NV_RAMUSERD.  NV_RAMUSERD is a
user-level driver's window into NV_RAMFC.  The NV_RAMUSERD state for each GPU
context is stored in an aligned NV_RAMUSERD_CHAN_SIZE-byte block of memory.

     To submit more methods, a user driver writes a PB segment to
memory, writes a GP entry that points to the PB segment, updates GP_PUT in
RAMUSERD, and writes the channel's handle to the
NV_USERMODE_NOTIFY_CHANNEL_PENDING register (see dev_usermode.ref).

     The RAMUSERD data structure is updated at regular intervals as controlled
by the NV_PFIFO_USERD_WRITEBACK setting (see dev_fifo.ref). For a particular
channel, RAMUSERD writeback can be disabled and it is reccomended that SW track
pushbuffer and channel progress via Host WFI_DIS semaphores rather than reading
the RAMUSERD data structure.

     When write-back is enabled a user driver can check the GPU progress in
executing a channel's PB segments. The driver can use:
    * GP_GET to monitor the index of the next GP entry the GPU will process
    * PB_GET to monitor the address of the next PB entry the GPU will process
    * TOP_LEVEL_GET (see NV_PPBDMA_TOP_LEVEL_GET) to monitor the address of the
      next "top-level" (non-SUBROUTINE) PB entry the GPU will process
    * REF to monitor the current "reference count" value see NV_PPBDMA_REF.

     Each entry in RAMUSERD corresponds to a PRI-accessible PBDMA register in Host.
For a description of the behavior and contents of a RAMUSERD entry, please see
the description of the corresponding register in "manuals/dev_pbdma.ref".

    RAMUSERD Entry                   PBDMA Register                 Access
    -------------------------------  -----------------------------  ----------
    NV_RAMUSERD_GP_PUT               NV_PPBDMA_GP_PUT(i)            Read/Write
    NV_RAMUSERD_GP_GET               NV_PPBDMA_GP_GET(i)            Read-only
    NV_RAMUSERD_GET                  NV_PPBDMA_GET(i)               Read-only
    NV_RAMUSERD_GET_HI               NV_PPBDMA_GET_HI(i)            Read-only
    NV_RAMUSERD_PUT                  NV_PPBDMA_PUT(i)               Read-only
    NV_RAMUSERD_PUT_HI               NV_PPBDMA_PUT_HI(i)            Read-only
    NV_RAMUSERD_TOP_LEVEL_GET        NV_PPBDMA_TOP_LEVEL_GET(i)     Read-only
    NV_RAMUSERD_TOP_LEVEL_GET_HI     NV_PPBDMA_TOP_LEVEL_GET_HI(i)  Read-only
    NV_RAMUSERD_REF                  NV_PPBDMA_REF(i)               Read-only
    -------------------------------  -----------------------------  ----------

     A user driver may write to NV_RAMUSERD_GP_PUT to kick off more work in a
channel.  Although writes to the other, read-only, entries can alter memory,
writes to those entries will not affect the operation of the GPU, and can be
overwritten by the GPU.

     When Host loads its part of a GPU context's state from RAMFC memory, it
may not immediately read RAMUSERD_GP_PUT.  Host can use the GP_PUT values from
RAMFC directly from RAMFC while waiting for the RAMUSERD_GP_PUT to synchronize.
Because reads of RAMUSERD_GP_PUT can be delayed, the value in NV_PPBDMA_GP_PUT
can be older than the value in NV_RAMUSERD_GP_PUT.

     When Host saves a GPU context's state to NV_RAMFC, it also writes to
NV_RAMUSERD the values of the entries other than GP_PUT.
Because Host does not continuously write the read-only RAMFC entries, the
read-only values in USERD memory can be older than the values in the Host PBDMA
unit.

#define NV_RAMUSERD                                                 /* ----G */
#define NV_RAMUSERD_PUT                        (16*32+31):(16*32+0) /* RWXUF */
#define NV_RAMUSERD_GET                        (17*32+31):(17*32+0) /* RWXUF */
#define NV_RAMUSERD_REF                        (18*32+31):(18*32+0) /* RWXUF */
#define NV_RAMUSERD_PUT_HI                     (19*32+31):(19*32+0) /* RWXUF */
#define NV_RAMUSERD_TOP_LEVEL_GET              (22*32+31):(22*32+0) /* RWXUF */
#define NV_RAMUSERD_TOP_LEVEL_GET_HI           (23*32+31):(23*32+0) /* RWXUF */
#define NV_RAMUSERD_GET_HI                     (24*32+31):(24*32+0) /* RWXUF */
#define NV_RAMUSERD_GP_GET                     (34*32+31):(34*32+0) /* RWXUF */
#define NV_RAMUSERD_GP_PUT                     (35*32+31):(35*32+0) /* RWXUF */
#define NV_RAMUSERD_BASE_SHIFT             9 /*       */
#define NV_RAMUSERD_CHAN_SIZE               512 /*       */


5 - RUN-LIST RAM (RAMRL)
========================

     Software specifies the GPU contexts that hardware should "run" by writing a
list of entries (known as a "runlist") to a 4k-aligned area of memory (beginning
at NV_PFIFO_RUNLIST_BASE), and by notifying Host that a new list is available
(by writing to NV_PFIFO_RUNLIST).
     Submission of a new runlist causes Host to expire the timeslice of all work
scheduled by the previous runlist, allowing it to schedule the channels present
in the new runlist once they are fetched. SW can check the status of the runlist
by polling NV_PFIFO_ENG_RUNLIST_PENDING. (see dev_fifo.ref NV_PFIFO_RUNLIST for
a full description of the runlist submit mechanism).
     Runlists can be stored in system memory or video memory (as specified by
NV_PFIFO_RUNLIST_BASE_TARGET). If a runlist is stored in video memory, software
will have to execute flush or read the last entry written before submitting the
runlist to Host to guarantee coherency .
     The size of a runlist entry data structure is 16 bytes. Each entry
specifies either a channel entry or a TSG header; the type is determined by the
NV_RAMRL_ENTRY_TYPE.


Runlist Channel Entry Type:

     A runlist entry of type NV_RAMRL_ENTRY_TYPE_CHAN specifies a channel to
run.  All such entries must occur within the span of some TSG as specified by
the NV_RAMRL_ENTRY_TYPE_TSG described below.  If a channel entry is encountered
outside a TSG, Host will raise the NV_PFIFO_INTR_SCHED_ERROR_CODE_BAD_TSG
interrupt.

     The fields available in a channel runlist entry are as follows (Fig 5.1):

  ENTRY_TYPE (T)        : type of this entry: ENTRY_TYPE_CHAN
  CHID (ID)             : identifier of the channel to run (overlays ENTRY_ID)
  RUNQUEUE_SELECTOR (Q) : selects which PBDMA should run this channel if
                          more than one PBDMA is supported by the runlist

  INST_PTR_LO           : lower 20 bits of the 4k-aligned instance block pointer
  INST_PTR_HI           : upper 32 bit of instance block pointer
  INST_TARGET (TGI)     : aperture of the instance block

  USERD_PTR_LO          : upper 24 bits of the low 32 bits, of the 512-byte-aligned USERD pointer
  USERD_PTR_HI          : upper 32 bits of USERD pointer
  USERD_TARGET (TGU)    : aperture of the USERD data structure

     CHID is a channel identifier that uniquely specifies the channel described
by this runlist entry to the scheduling hardware and is reported in various
status registers.
     RUNQUEUE_SELECTOR determines to which runqueue the channel belongs, and
thereby which PBDMA will run the channel.  Increasing values select increasingly
numbered PBDMA IDs serving the runlist.  If the selector value exceeds the
number of PBDMAs on the runlist, the hardware will silently reassign the channel
to run on the first PBDMA as though RUNQUEUE_SELECTOR had been set to 0.  (In
current hardware, this is used by SCG on the graphics runlist only to determine
which FE pipe should service a given channel.  A value of 0 targets the first FE
pipe, which can process all FE driven engines: Graphics, Compute, Inline2Memory,
and TwoD.  A value of 1 targets the second FE pipe, which can only process
Compute work.  Note that GRCE work is allowed on either runqueue.)
     The INST fields specify the physical address of the channel's instance
block, the in-memory data structure that stores the context state.
The target aperture of the instance block is given by INST_TARGET, and the byte
offset within that aperture is calculated as

 (INST_PTR_HI << 32) | (INST_PTR_LO  << NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT)

This address should match the one specified in the channel RAM's
NV_PCCSR_CHANNEL_INST register; see NV_RAMIN and NV_RAMFC for the format of the
instance block.  The hardware ignores the RAMRL INST fields, but in future
chips the instance pointer may be removed from the channel RAM and the RAMRL
INST fields used instead, resulting in smaller hardware.
     The USERD fields specify the physical address of the USERD memory region
used by software to submit additional work to the channel.  The target aperture
of the USERD region is given by USERD_TARGET, and the byte offset within that
aperture is calculated as

 (USERD_PTR_HI << 32) | (USERD_PTR_LO  << NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT)


SW uses the NV_RAMUSERD_CHAN_SIZE define to allocate and align a channel's
RAMUSERD data structure.  See the documentation for NV_RAMUSERD for a
description of the use of USERD and its format.  This address and it's
alignment must match the one specified in the RAMFC's NV_RAMFC_USERD and
NV_RAMFC_USERD_HI fields which are backed by NV_PPBDMA_USERD in dev_pbdma.ref.
The hardware ignores the RAMRL USERD fields, but in future chips the USERD
pointer may be read from these fields in the runlist entry instead of the RAMFC
to avoid the extra level of indirection in fetching the USERD data that
currently results in a dependent read.


Runlist TSG Entry Type:

     The other type of runlist entry is Timeslice Group (TSG) header entry
(Fig 5.2). This type of entry is specified by NV_RAMRL_ENTRY_TYPE_TSG. A TSG
entry describes a collection of channels all of which share the same context and
are scheduled as a single unit by Host. All runlists support this type of entry.

     The fields available in a TSG header runlist entry are as follows (Fig 5.2):

  ENTRY_TYPE (T)      : type of this entry: ENTRY_TYPE_TSG
  TSGID               : identifier of the Timeslice group (overlays ENTRY_ID)
  TSG_LENGTH          : number of channels that are part of this timeslice group
  TIMESLICE_SCALE     : scale factor for the TSG's timeslice
  TIMESLICE_TIMEOUT   : timeout amount for the TSG's timeslice

     A timeslice group entry consists of an integer identifier along with a
length which specifies the number of channels in the TSG. After a TSG header
runlist entry, the next TSG_LENGTH runlist entries are considered to be part of
the timeslice group.  Note that the minimum length of a TSG is at least one entry.
     All channels in a TSG share the same runlist timeslice which specifies how
long a single context runs on an engine or PBDMA before being swapped for a
different context. The timeslice period is set in the TSG header by specifying
TSG_TIMESLICE_TIMEOUT and TSG_TIMESLICE_SCALE. The TSG timeslice period is
calculated as follows:

  timeslice = (TSG_TIMESLICE_TIMEOUT << TSG_TIMESLICE_SCALE) * 1024 nanoseconds

     The timeslice period should normally not be set to zero.  A timeslice of
zero will be treated as a timeslice period of one . The runlist
timeslice period begins after the context has been loaded on a PBDMA but is
paused while the channel has an outstanding context load to an engine.  Time
spent switching a context into an engine is not part of the runlist timeslice.

     If Host reaches the end of the runlist or receives another entry of type
NV_RAMRL_ENTRY_TYPE_TSG before processing TSG_LENGTH additional runlist entries,
or if it encounters a TSG of length 0, a SCHED_ERROR interrupt will be generated
with ERROR_CODE_BAD_TSG.


Host Scheduling Memory Layout:

Example of graphics runlist entry to GPU context mapping via channel id:


                           .------Ints_ptr -------.
                           |                      |
     Graphics Runlist      |    Channel-Map RAM   |          GPU Instance Block
     .------------ .       |  .----------------.  |        .-------------------.
     | TSG Hdr L=m |--.----'  |Ch0 Inst Blk Ptr|--'------->| Host State        |
     | RL Entry T1 |  |       |Ch1 Inst Blk Ptr|    .------| Memory State      |
     | RL Entry T2 |  |       |       ...      |    |      | Engine0 State Ptr |
     |    ...      |  |-chid->|ChI Inst Blk Ptr|    |      | Engine1 State Ptr |
     | RL Entry Tm |  |       |       ...      |    |      |     ...           |
     | TSG Hdr L=n |  |       |ChN Inst Blk Ptr|    |    .-| EngineN State Ptr |
     | RL Entry T1 |  |       `----------------'    |    | `-------------------'
     | RL Entry T2 |userd_ptr                       |    |
     |    ...      |  |        .--------------.     |    |   .--------------.
     | RL Entry Tn |  |        |    USERD     |     |    |   |  Engine Ctx  |
     |             |  '------->|              |<----'    '-->|    State N   |
     `-------------'           |              |              |              |
                               `--------------'              `--------------'

Runlist Diagram Description:
    Here we have (M+N) number of channel type (ENTRY_TYPE_CHID) runlist entries
grouped together within two TSGs. The first entry in the runlist is a TSG header
entry (ENTRY_TYPE_TSG) that describes the first TSG. The TSG header specifies m
as the length of the TSG. The header would also contain the timeslice
information for the TSG (SCALE/TIMEOUT), as well as the TSG id specified in the
TSGID field.
    Because the length here is M, the Runlist *must* contain M additional
runlist entries of type ENTRY_TYPE_CHAN that will be part of this TSG.
Similarly, the next (N+1) number of entries, a TSG header entry followed by N
number of regular channel entry, correspond to the second TSG.

#define NV_RAMRL_ENTRY                                               /* ----G */
#define NV_RAMRL_ENTRY_RANGE                          0xF:0x00000000 /* RW--M */
#define NV_RAMRL_ENTRY_SIZE                                       16 /*       */
// Runlist base must be 4k-aligned.
#define NV_RAMRL_ENTRY_BASE_SHIFT                                 12 /*       */


#define NV_RAMRL_ENTRY_TYPE                        (0+0*32):(0+0*32) /* RWXUF */
#define NV_RAMRL_ENTRY_TYPE_CHAN                          0x00000000 /* RW--V */
#define NV_RAMRL_ENTRY_TYPE_TSG                           0x00000001 /* RW--V */

#define NV_RAMRL_ENTRY_ID                         (11+2*32):(0+2*32) /* RWXUF */
#define NV_RAMRL_ENTRY_ID_HW                      11:0 /* RWXUF */
#define NV_RAMRL_ENTRY_ID_MAX              (4096-1) /* RW--V */


#define NV_RAMRL_ENTRY_CHAN_RUNQUEUE_SELECTOR      (1+0*32):(1+0*32) /* RWXUF */

#define NV_RAMRL_ENTRY_CHAN_INST_TARGET                   (5+0*32):(4+0*32) /* RWXUF */
#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_VID_MEM                  0x00000000 /* RW--V */
#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_COHERENT         0x00000002 /* RW--V */
#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_NONCOHERENT      0x00000003 /* RW--V */

#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET                  (7+0*32):(6+0*32) /* RWXUF */
#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM                 0x00000000 /* RW--V */
#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM_NVLINK_COHERENT 0x00000001 /* RW--V */
#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_COHERENT        0x00000002 /* RW--V */
#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_NONCOHERENT     0x00000003 /* RW--V */

#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_LO          (31+0*32):(8+0*32) /* RWXUF */
#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_HI          (31+1*32):(0+1*32) /* RWXUF */

#define NV_RAMRL_ENTRY_CHAN_CHID                  (11+2*32):(0+2*32) /* RWXUF */

#define NV_RAMRL_ENTRY_CHAN_INST_PTR_LO          (31+2*32):(12+2*32) /* RWXUF */
#define NV_RAMRL_ENTRY_CHAN_INST_PTR_HI           (31+3*32):(0+3*32) /* RWXUF */


// Macros for shifting out low bits of INST_PTR and USERD_PTR.
#define NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT                  12 /* ----C */
#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT                  8 /* ----C */


#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE       (19+0*32):(16+0*32) /* RWXUF */
#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE_3              0x00000003 /* RWI-V */
#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT     (31+0*32):(24+0*32) /* RWXUF */
#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_128          0x00000080 /* RWI-V */


#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_1US          0x00000000 /*       */

#define NV_RAMRL_ENTRY_TSG_LENGTH                  (7+1*32):(0+1*32) /* RWXUF */
#define NV_RAMRL_ENTRY_TSG_LENGTH_INIT                    0x00000000 /* RW--V */
#define NV_RAMRL_ENTRY_TSG_LENGTH_MIN                     0x00000001 /* RW--V */
#define NV_RAMRL_ENTRY_TSG_LENGTH_MAX                     0x00000080 /* RW--V */

#define NV_RAMRL_ENTRY_TSG_TSGID                  (11+2*32):(0+2*32) /* RWXUF */


6  -  Host Pushbuffer Format (FIFO_DMA)
=======================================

     "FIFO" refers to Host.  "FIFO_DMA" means data that Host reads from memory:
the pushbuffer.  Host autonomously reads pushbuffer data from memory and
generates method address/data pairs from the data.

     Pushbuffer terminology:

     - A channel is the logical sequence of instructions associated with a GPU
       context.

     - The pushbuffer is a stream of data in memory containing the
       specifications of the operations that a channel is to perform for a
       particular client.  Pushbuffer data consists of pushbuffer entries.

     - A pushbuffer entry (PB entry) is a 32-bit (doubleword) sized unit of
       pushbuffer data.  This is the smallest granularity at which Host consumes
       pushbuffer data.  A PB entry is either a PB instruction (which is either
       a PB control entry or a PB method header), or a method data entry.

     - A pushbuffer segment (PB segment) is a contiguous block of memory
       containing pushbuffer entries.  The location and size of a pushbuffer
       segment is defined by its respective GP entry in the GPFIFO.

     - A pushbuffer control entry (PB control entry) is a single PB entry of
       type SET_SUBDEVICE_MASK, STORE_SUBDEVICE_MASK, USE_SUBDEVICE_MASK,
       END_PB_SEGMENT, or a universal NOP (NV_FIFO_DMA_NOP).

     - A pushbuffer compressed method sequence is a sequence of pushbuffer
       entries starting with a method header and a variable-length sequence of
       method data entries (the length being defined by the method header).  A
       single PB compressed method sequence expands into one or more methods.
       This may also be known as a "pushbuffer method" (PB method), but that
       terminology is ambiguous and not preferred.

     - A pushbuffer method header (PB method header) is the first PB entry found
       in a PB compressed method sequence.  A PB method header is a PB
       instruction performed on method data entries.

     - A pushbuffer instruction (PB instruction) is a PB entry that is not a PB
       method data entry.  A PB instruction is either a PB control entry or a PB
       method header.

     - A method is an address/data pair representing an operation to perform.

     - A method data entry is the 32-bit operand for its corresponding method.


#define NV_FIFO_PB_ENTRY_SIZE                                     4 /*       */


     Some engines such as Graphics internally support a double-wide method FIFO;
these are known as "data-hi" methods.  It is Host that performs the packing of
two methods into one double-wide entry.  Host will only generate data-hi methods
if the following conditions are satisfied:

     1. The two methods come from the same PB method (in other words they share
        the same method header).

     2. The method header specifies a non-incrementing method, an incrementing
        method, or an increment-once method.

     3. The paired methods either have the same method address, or the first
        method has an even NV_FIFO_DMA_METHOD_ADDRESS field and the second
        (data-hi) method is the increment of the first.  (That is, the
        left-shifted method address as listed in the class files must be
        divisible by 8 for this condition to hold.)

     4. The second method is available at the time of pushing the first one into
        the engine's method FIFO. In other words, Host will not wait to pack
        methods.  Note that if the engine's method fifo is full, the
        back-pressure will in itself create a "wait time".

The first three conditions are under SW's control.  Only the graphics engine
supports data-hi methods.


Types of PB Entries

     PB entries can be classified into three types: PB method headers, PB
control entries, and PB method data.  Different types of PB entries have
different formats.  Because PB compressed method sequences are of variable
length, it is impossible to determine the type of a PB entry without tracking
the pushbuffer from the beginning or from the location of a PB entry that is
known to not be a PB method data entry.

     A PB method data entry is always found in a method data sequence
immediately following a PB method header in the logical stream of PB entries.
The PB method header contains a NV_FIFO_DMA_METHOD_COUNT field, the value of
which is equal to the length of the method data sequence.  Note a PB method
header does not necessarily come with PB method data entries (see details below
about immediate-data method headers and method headers for which COUNT is zero).
Also note the PB method data entries may be located in a PB segment separate
from their corresponding method header.  The format of any given PB method data
entry is defined in the "NV_UDMA" section of dev_pbdma.ref.

     A PB entry that is either a PB method header or PB control entry is known
as a PB instruction.  The type of a PB instruction is specified by the
NV_FIFO_DMA_SEC_OP field and the NV_FIFO_DMA_TERT_OP field.

   secondary  tertiary
    opcode     opcode   entry type
   ---------  --------  --------------------------------
      000        01     SET_SUBDEVICE_MASK
      000        10     STORE_SUBDEVICE_MASK
      000        11     USE_SUBDEVICE_MASK
      001        xx     incrementing method header
      011        xx     non-incrementing method header
      100        xx     immediate-data method header
      101        xx     increment-once method header
      111        xx     END_PB_SEGMENT
   ---------  --------  --------------------------------

     Types of methods:

     - A Host method is a method whose address is defined in the NV_UDMA device
       range.

     - A Host-only method is any Host method excluding SetObject (also known as
       NV_UDMA_OBJECT).

     - An engine method is a method whose address is not defined within the
       NV_UDMA device range.  There are multiple engines designated by a
       subchannel ID.  Software methods are included in this category.

     - A software method (SW method) is a method which causes an interrupt for
       the express purpose of being handled by software.  For details see the
       section on software methods below.

     For more information about types of methods see "HOST METHODS" and
"RESERVED METHOD ADDRESSES" in dev_pbdma.ref.

     The method address in a PB method header (stored in the
NV_FIFO_DMA_METHOD_ADDRESS field) is a dword-address, not a byte-address.  In
other words the least significant two bits of the address are not stored because
the byte-address is dword-aligned (thus the least significant two bits are
always zero).

     The subchannel in a PB method header (stored in the
NV_FIFO_DMA_*_SUBCHANNEL field) determines the engine to which a method will be
sent if the method is SetObject or an engine method (otherwise, the SUBCHANNEL
field is ignored).  SetObject enables SW to request HW to check the expectation
that a given subchannel serves the specified class ID; see the description of
"NV_UDMA_OBJECT" in dev_pbdma.ref.

     The mapping between subchannels and engines is fixed.  A subchannel is
bound to a given class according to the runlist.  Each engine method is applied
to an "object," which itself is an instance of an NV class as defined by the
master MFS class files.  Each object belongs to an engine.  For SetObject and
engine methods, the engine is determined entirely by the SUBCHANNEL field of
the method's header via a fixed mapping that depends on the runlist on which the
method arrives.

     Methods on subchannels 0-4 are handled by the primary engine served by the
runlist, except that subchannel 4 targets GRCOPY0 and GRCOPY1 on the graphics
runlist.  For Graphics/Compute, SetObject associates subchannels 0, 1, 2, and 3
with class identifiers for 3D, compute, I2M, and 2D respectively.  On other
runlists, the subchannel is ignored, and Host does not send the subchannel ID to
the engine.  It is recommended that SW only use subchannel 4 on the dedicated
copy engines for consistency with GRCOPY usage.

     Subchannels 5-7 are for software methods.  Any methods on these subchannels
(including SetObject methods) are kicked back to software for handling via the
SW method dispatch mechanism using the NV_PPBDMA_INTR_*_DEVICE interrupt.  SW
may choose to send a SetObject method to each engine subchannel before sending
any methods on that particular subchannel in order to support multiple software
classes.

     If a method stream subchannel-switches from targeting graphics/compute to a
copy engine or vice-versa, that is, to or from subchannel 4 on GR, Host will:

     1. Wait until the first engine has completed all its methods,

     2. Wait until that engine indicates that it is idle (WFI), and

     3. Send a sysmem barrier flush and wait until it completes.

Only then will Host send methods to the newly targeted engine.

     Note that this WFI will not occur for sending Host-only methods on the new
subchannel, since Host-only methods ignore the subchannel field.  Additionally,
when switching from CE to graphics/compute, Host forces FE to perform a cache
invalidate.  Other subchannel switch semantics may be provided by the engines
themselves, such as switching between subchannels 0-3 within FE.


#define NV_FIFO_DMA                                                 /* ----G */
#define NV_FIFO_DMA_METHOD_ADDRESS_OLD                         12:2 /* RWXUF */
#define NV_FIFO_DMA_METHOD_ADDRESS                             11:0 /* RWXUF */

#define NV_FIFO_DMA_SUBDEVICE_MASK                             15:4 /* RWXUF */

#define NV_FIFO_DMA_METHOD_SUBCHANNEL                         15:13 /* RWXUF */

#define NV_FIFO_DMA_TERT_OP                                   17:16 /* RWXUF */
#define NV_FIFO_DMA_TERT_OP_GRP0_SET_SUB_DEV_MASK        0x00000001 /* RW--V */
#define NV_FIFO_DMA_TERT_OP_GRP0_STORE_SUB_DEV_MASK      0x00000002 /* RW--V */
#define NV_FIFO_DMA_TERT_OP_GRP0_USE_SUB_DEV_MASK        0x00000003 /* RW--V */

#define NV_FIFO_DMA_METHOD_COUNT_OLD                          28:18 /* RWXUF */
#define NV_FIFO_DMA_METHOD_COUNT                              28:16 /* RWXUF */
#define NV_FIFO_DMA_IMMD_DATA                                 28:16 /* RWXUF */

#define NV_FIFO_DMA_SEC_OP                                    31:29 /* RWXUF */
#define NV_FIFO_DMA_SEC_OP_GRP0_USE_TERT                 0x00000000 /* RW--V */
#define NV_FIFO_DMA_SEC_OP_INC_METHOD                    0x00000001 /* RW--V */
#define NV_FIFO_DMA_SEC_OP_NON_INC_METHOD                0x00000003 /* RW--V */
#define NV_FIFO_DMA_SEC_OP_IMMD_DATA_METHOD              0x00000004 /* RW--V */
#define NV_FIFO_DMA_SEC_OP_ONE_INC                       0x00000005 /* RW--V */
#define NV_FIFO_DMA_SEC_OP_RESERVED6                     0x00000006 /* RW--V */
#define NV_FIFO_DMA_SEC_OP_END_PB_SEGMENT                0x00000007 /* RW--V */


Incrementing PB Method Header Format

     An incrementing PB method header specifies that Host generate a sequence of
methods.  The length of the sequence is defined by the method header.  The
method data for each method in this sequence is found in a sequence of PB
entries immediately following the method header.

     The dword-address of the first method is specified by the method header,
and the dword-address of each subsequent method is equal to the dword-address of
the previous method plus one.  Or in other words, the byte-address of each
subsequent method is equal to the byte-address of the previous method plus four.

Example sequence of methods generated from an incrementing method header:

     addr    data0
     addr+1  data1
     addr+2  data2
     addr+3  data3
     ...      ...

     The NV_FIFO_DMA_INCR_COUNT field contains the number of methods in the
generated sequence.  This is the same as the number of method data entries that
follow the method header.  If the COUNT field is zero, the other fields are
ignored, and the PB method effectively becomes a no-op with no method data
entries following it.

     The NV_FIFO_DMA_INCR_SUBCHANNEL field contains the subchannel to use for
the methods generated from the method header.  See the documentation above for
NV_FIFO_DMA_*_SUBCHANNEL.

     The NV_FIFO_DMA_INCR_ADDRESS field contains the method address for the
first method in the generated sequence.  The dword-address of the method is
incremented by one each time a method is generated.  A method address specifies
an operation to be performed.  Note that because the ADDRESS is a dword-address
and not a byte-address, the least two significant bits of the method's
byte-address are not stored.

     The NV_FIFO_DMA_INCR_DATA fields contain the method data for the methods in
the generated sequence.  The number of method data entries is defined by the
COUNT field.  A method data entry contains an operand for its respective method.

     Bit 12 is reserved for the future expansion of either the subchannel or the
address fields.


#define NV_FIFO_DMA_INCR                                            /* ----G */
#define NV_FIFO_DMA_INCR_OPCODE                 (0*32+31):(0*32+29) /* RWXUF */
#define NV_FIFO_DMA_INCR_OPCODE_VALUE                    0x00000001 /* ----V */
#define NV_FIFO_DMA_INCR_COUNT                  (0*32+28):(0*32+16) /* RWXUF */
#define NV_FIFO_DMA_INCR_SUBCHANNEL             (0*32+15):(0*32+13) /* RWXUF */
#define NV_FIFO_DMA_INCR_ADDRESS                 (0*32+11):(0*32+0) /* RWXUF */
#define NV_FIFO_DMA_INCR_DATA                    (1*32+31):(1*32+0) /* RWXUF */


Non-Incrementing PB Method Header Format

     A non-incrementing PB method header specifies that Host generate a sequence
of methods.  The length of the sequence is defined by the method header.  The
method data for each method in this sequence is contained within the PB entries
immediately following the method header.

     Unlike with the incrementing PB method header, the sequence of methods
generated all have the same method address.  The dword-address of every method
in this sequence is specified by the method header.  Although the methods all
have the same address, the method data entries may be different.

Example sequence of methods generated from a non-incrementing method header:

     addr    data0
     addr    data1
     addr    data2
     addr    data3
     ...      ...

     The NV_FIFO_DMA_NONINCR_COUNT field contains the number of methods
in the generated sequence.  This is the same as the number of method data
entries that follow the method header.  If the COUNT field is zero, the other
fields are ignored, and the PB method effectively becomes a no-op with no method
data entries following it.

     The NV_FIFO_DMA_NONINCR_SUBCHANNEL field contains the subchannel to use for
the methods generated from the method header.  See the documentation above for
NV_FIFO_DMA_*_SUBCHANNEL.

     The NV_FIFO_DMA_NONINCR_ADDRESS field contains the method address for every
method in the generated sequence.  A method address specifies an operation to be
performed.  Note that because the ADDRESS field is a dword-address and not a
byte-address, the least two significant bits of the method's byte-address are
not stored.

     The NV_FIFO_DMA_NONINCR_DATA fields contain the method data for the methods
in the generated sequence.  The number of method data entries is defined by the
COUNT field.  A method data entry contains an operand for its respective method.

     Bit 12 is reserved for the future expansion of either the subchannel or the
address fields.


#define NV_FIFO_DMA_NONINCR                                         /* ----G */
#define NV_FIFO_DMA_NONINCR_OPCODE              (0*32+31):(0*32+29) /* RWXUF */
#define NV_FIFO_DMA_NONINCR_OPCODE_VALUE                 0x00000003 /* ----V */
#define NV_FIFO_DMA_NONINCR_COUNT               (0*32+28):(0*32+16) /* RWXUF */
#define NV_FIFO_DMA_NONINCR_SUBCHANNEL          (0*32+15):(0*32+13) /* RWXUF */
#define NV_FIFO_DMA_NONINCR_ADDRESS              (0*32+11):(0*32+0) /* RWXUF */
#define NV_FIFO_DMA_NONINCR_DATA                 (1*32+31):(1*32+0) /* RWXUF */


Increment-Once PB Method Header Format

     An increment-once PB method header specifies that Host generate a sequence
of methods.  The length of the sequence is defined by the method header.  The
method data for each method in this sequence is found in a sequence of PB
entries immediately following the method header.

     The dword-address of the first method is specified by the method header.
The address of the second and all following methods is equal to the
dword-address of the first method plus one.  In other words, the byte-address of
the second and all following methods is equal to the byte-address of the first
method plus four.

Example sequence of methods generated from an increment-once method header:

     addr     data0
     addr+1   data1
     addr+1   data2
     addr+1   data3
     ...      ...

     The NV_FIFO_DMA_ONEINCR_COUNT field contains the number of methods in the
generated sequence.  This is the same as the number of method data entries that
follow the method header.  If the COUNT field is zero, the other fields are
ignored, and the PB method effectively becomes a no-op method with no method
data entries following it.

     The NV_FIFO_DMA_ONEINCR_SUBCHANNEL field contains the subchannel to use for
the methods generated from the method header.  See the documentation above for
NV_FIFO_DMA_*_SUBCHANNEL.

     The NV_FIFO_DMA_ONEINCR_ADDRESS field contains the method address for the
first method in the generated sequence.  A method address specifies an operation
to be performed.  Note that because the ADDRESS is a dword-address and not a
byte-address, the least two significant bits of the method's byte-address are
not stored.

     The NV_FIFO_DMA_ONEINCR_DATA fields contain the method data for the methods
in the generated sequence.  The number of method data entries is defined by the
COUNT field.  A method data entry contains an operand for its respective method.

     Bit 12 is reserved for the future expansion of either the subchannel or the
address fields.


#define NV_FIFO_DMA_ONEINCR                                         /* ----G */
#define NV_FIFO_DMA_ONEINCR_OPCODE              (0*32+31):(0*32+29) /* RWXUF */
#define NV_FIFO_DMA_ONEINCR_OPCODE_VALUE                 0x00000005 /* ----V */
#define NV_FIFO_DMA_ONEINCR_COUNT               (0*32+28):(0*32+16) /* RWXUF */
#define NV_FIFO_DMA_ONEINCR_SUBCHANNEL          (0*32+15):(0*32+13) /* RWXUF */
#define NV_FIFO_DMA_ONEINCR_ADDRESS              (0*32+11):(0*32+0) /* RWXUF */
#define NV_FIFO_DMA_ONEINCR_DATA                 (1*32+31):(1*32+0) /* RWXUF */


No-Operation PB Instruction Formats

     The method header for a no-op PB method may be specified in multiple ways,
but the preferred way is to set the PB instruction to NV_FIFO_DMA_NOP.
In any case NV_FIFO_DMA_NOP is a universal NOP entry that bypasses any method
header format check, and is not considered a method header.


#define NV_FIFO_DMA_NOP                                  0x00000000 /* ----C */


Immediate-Data PB Method Header Format

     If a method's operand fits within 13 bits, a PB method may be specified in
a single PB entry, using the immediate-data PB method header format.  Exactly
one method is generated from this method header.

     The NV_FIFO_DMA_IMMD_SUBCHANNEL field contains the subchannel to use for
the method generated from the method header.  See the documentation above for
NV_FIFO_DMA_*_SUBCHANNEL.

     The NV_FIFO_DMA_IMMD_ADDRESS field contains the method address for the
single generated method.  A method address specifies an operation to be
performed.  Note that because the ADDRESS is a dword-address and not a
byte-address, the least two significant bits of the method's byte-address are
not stored.

     The single NV_FIFO_DMA_IMMD_DATA field contains the method data for the
generated method.  This method data contains an operand for the generated
method.


#define NV_FIFO_DMA_IMMD                                            /* ----G */
#define NV_FIFO_DMA_IMMD_ADDRESS                               11:0 /* RWXUF */
#define NV_FIFO_DMA_IMMD_SUBCHANNEL                           15:13 /* RWXUF */
#define NV_FIFO_DMA_IMMD_DATA                                 28:16 /* RWXUF */
#define NV_FIFO_DMA_IMMD_OPCODE                               31:29 /* RWXUF */
#define NV_FIFO_DMA_IMMD_OPCODE_VALUE                    0x00000004 /* ----V */


Set Sub-Device Mask PB Control Entry Format

     The SET_SUBDEVICE_MASK (SSDM) PB control entry is used when multiple GPU
contexts are using the same pushbuffer (for example, for SLI or for stereo
rendering) and there is data in the push buffer that is for only a subset of the
GPU contexts.  This instruction allows the pushbuffer to tell a specific GPU
context to use or ignore methods following the SET_SUBDEVICE_MASK.  While the
logical-AND of NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE and the GPU context's
NV_PPBDMA_SUBDEVICE_ID value is zero, methods are ignored.  Pushbuffer control
entries (like SET_SUBDEVICE_MASK) are not ignored.

********************************************************************************
Warning: When using subdevice masking, one must take care to synchronize
properly with any later GP entries marked FETCH_CONDITIONAL.  If GP fetching
gets too far ahead of PB processing, it is possible for a later conditional PB
segment to be discarded prior to reaching an SSDM command that sets
SUBDEVICE_STATUS to ACTIVE.  This would cause Host to execute garbage data.  One
way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL
segments following a subdevice reenable.
********************************************************************************


#define NV_FIFO_DMA_SET_SUBDEVICE_MASK                              /* ----G */
#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE                   15:4 /* RWXUF */
#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE                 31:16 /* RWXUF */
#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE_VALUE      0x00000001 /* ----V */


Store Sub-Device Mask PB Control Entry Format

     The STORE_SUBDEVICE_MASK PB control entry is used to save a subdevice mask
value to be used later by a USE_SUBDEVICE_MASK PB instruction.


#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK                            /* ----G */
#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_VALUE                 15:4 /* RWXUF */
#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE               31:16 /* RWXUF */
#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE_VALUE    0x00000002 /* ----V */


Use Sub-Device Mask PB Control Entry Format

     The USE_SUBDEVICE_MASK PB control entry is used to apply the subdevice mask
value saved by a STORE_SUBDEVICE_MASK PB instruction.  The effect of the mask is
the same as for a SET_SUBDEVICE_MASK PB instruction.


#define NV_FIFO_DMA_USE_SUBDEVICE_MASK                              /* ----G */
#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE                 31:16 /* RWXUF */
#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE_VALUE      0x00000003 /* ----V */


End-PB-Segment PB Control Entry Format

     Engines may write PB segments themselves, but they cannot write GP entries.
Because they cannot write GP entries, they cannot alter the size of a PB
segment.  If an engine is writing a PB segment, and if it does not need to fill
the entire PB segment it was allocated, instead of filling the remainder of the
PB segment with no-op PB instructions, it may write a single End-PB-Segment
control entry to indicate that the pushbuffer data contains no further valid
data.  No further PB entries from that PB segment will be decoded or processed.
Host may have already issued requests to fetch the remainder of the PB segment
before an End-PB-Segment PB instruction is processed.  Host may or may not fetch
the remainder of the PB segment.  Also note that doing a PB CRC check on this
segment via NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC will be indeterminate.


#define NV_FIFO_DMA_ENDSEG_OPCODE                             31:29 /* RWXUF */
#define NV_FIFO_DMA_ENDSEG_OPCODE_VALUE                  0x00000007 /* ----V */