New ref manuals directory, delete old locations

As decided in a recent OpenSource-Approval meeting, we want the directory structure for reference manuals here to be fairly close to the way they are organized internal to NVIDIA. This CL therefore does the following: Rename from: Host-Fifo/volta/gv100/* Display-Ref-Manuals/gv100/* to: manuals/volta/gv100/* Regenerate index.html files to match (important for the "github pages" site, at https://nvidia.github.io/open-gpu-doc/ . Reviewed by: Maneet Singh
author: John Hubbard <jhubbard@nvidia.com> 2019-06-12 14:41:51 -0700
committer: John Hubbard <jhubbard@nvidia.com> 2019-06-13 19:23:50 -0700
commit: f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f (patch)
tree: 1f9488efca18d52ccfc016c7531df4ceac94989c /Host-Fifo/volta/gv100/dev_ram.ref.txt
parent: 187a308aea3f133dfb27ebf6bafe75ffa15fc353 (diff)
download: open-gpu-doc-f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f.tar.xz
1 files changed, 0 insertions, 1269 deletions
diff --git a/Host-Fifo/volta/gv100/dev_ram.ref.txt b/Host-Fifo/volta/gv100/dev_ram.ref.txt
deleted file mode 100644
index e80d9c0..0000000
--- a/Host-Fifo/volta/gv100/dev_ram.ref.txt
+++ /dev/null
@@ -1,1269 +0,0 @@
-Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
-
-Permission is hereby granted, free of charge, to any person obtaining a
-copy of this software and associated documentation files (the "Software"),
-to deal in the Software without restriction, including without limitation
-the rights to use, copy, modify, merge, publish, distribute, sublicense,
-and/or sell copies of the Software, and to permit persons to whom the
-Software is furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
-THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-DEALINGS IN THE SOFTWARE.
---------------------------------------------------------------------------------
-
-2  -  GPU INSTANCE RAM (RAMIN)
-==============================
-
-     A GPU contains a block called "XVE" that manages the interface with PCI, a
-block called "Host" that fetches graphics instructions, blocks called "engines"
-that execute graphics instructions, and blocks that manage the interface with
-memory.
-
-               .-----.                    .------.
-               |     |<------------------>|      |
-               |     |                    |      |
-               |     |     .---------.    |      |
-               |     |<--->| Engine1 |<---|      |
-               |     |     `---------'    |      |
-.---------.    |     |                    |      |
-|   GPU   |    |     |     .---------.    | Host |
-|  Local  |<-->|  FB |<--->| Engine2 |<---|      |
-| Memory  |    | MMU |     `---------'    |      |
-`---------'    | Hub |         ...        |      |   .--------.
-               |     |     .---------.    |      |   | System |
-               |     |<--->| EngineN |<---|      |   | Memory |
-               |     |     `---------'    `------'   `--------'
-               |     |                       ^           ^
-               |     |                       |           |
-.---------.    |     |                    .--V--. PCI .--V--.     .-----.
-| Display |<-->|     |<------------------>| XVE |<--->| NB  |<--->| CPU |
-`---------'    `-----'                    `-----'     `-----'     `-----'
-
-     A GPU context is a virtualization of the GPU for a particular software
-application.  A GPU instance block is a block of memory that contains the state
-for a GPU context.  A GPU context's instance block consists of Host state,
-pointers to each engine's state, and memory management state.  A GPU instance
-block also contains a pointer to a block of memory that contains that part of a
-GPU context's state that a user-level driver may access.  A GPU instance block
-fits within a single 4K-byte page of memory.
-
-       Run List             Channel-Map RAM
-     .----------.  Ch Id   .----------------.
-     | RL Entry0 |----.    |Ch0 Inst Blk Ptr|
-     | RL Entry1 |    |    |Ch1 Inst Blk Ptr|
-     | RL Entry2 |    |    |       ...      |
-     |    ...    |    `--->|ChI Inst Blk Ptr|----.
-     | RL EntryN |         |       ...      |    |
-     `-----------'         |ChN Inst Blk Ptr|    |
-                           `----------------'    |
-                                                 |
- .-----------------------------------------------'
- |
- |    GPU Instance Block                                 GPFIFO
- `-->.-----------------.                        GP_GET .--------.     PB Seg
-     |                 |------------------------------>|GP Entry|    .--------.
-     |   Host State    |                               |GP Entry|--->|PB Entry|
-     |     (RAMFC)     |          User-Driver State    |        |    |PB Entry|
-     |                 |              .-------.        |GP Entry|    |   ...  |
-     |                 |------------->|(USERD)| GP_PUT |GP Entry|    |PB Entry|
-     |                 |              |       |------->`--------'    `--------'
-     |                 |              |       |
-     +-----------------+              |       |
-     |     Memory      |              `-------'
-     |   Management    |----------.  Page Directory    Page Table
-     |     State       |          |   .-------.        .-------.
-     +-----------------+          `-->|  PDE  |        |  PTE  |
-     |   Pointer to    |              |  PDE  |------->|  PTE  |
-     |     Engine0     |--------.     |  ...  |        |  ...  |
-     |      State      |        |     |  PDE  |        |  PTE  |
-     +-----------------+        |     `-------'        `-------'
-     |   Pointer to    |        |
-     |     Engine1     |-----.  |   Engine0 State
-     |      State      |     |  |     .-------.
-     +-----------------+     |  `---->|       |
-            ...              |        `-------'
-     +-----------------+     |
-     |   Pointer to    |     |      Engine1 State
-     |     EngineN     |--.  |        .-------.
-     |      State      |  |  `------->|       |
-     `-----------------'  |           `-------'
-                          |               ...
-                          |
-                          |         EngineN State
-                          |           .-------.
-                          `---------->|       |
-                                      `-------'
-
-     The GPU context's Host state occupies the first 128 double words of an
-instance block.  A GPU context's Host state is called "RAMFC". Please see
-the NV_RAMFC section below for a description of Host state.
-
-     The GPU context's memory-management state defines the virtual address space
-that the GPU context uses.  Memory management state consists of page and
-directory tables (that specify the mapping between virtual addresses and
-physical addresses, and the attributes of memory pages), and the limit of the
-virtual address space.  The NV_RAMIN_PAGE_DIR_BASE entry contains the address of
-base of the GPU context's page directory table (PDB).  NV_RAMIN_PAGE_DIR_BASE is
-4K-byte aligned.
-
-     The NV_RAMIN_ENG*_WFI_PTR entry contains the address of a block of memory
-for storing an engine's context state.  Blocks of memory that contain engine state
-are 4K-byte aligned.  Only one engine context is supported per instance block.
-
-     The NV_RAMIN_ENG*_CS field is deprecated, it was used to indicate whether
-GPU state should be restored from the FGCS pointer or from the WFI CS pointer.
-Engines only need/support one CTXSW pointer and all state is stored there
-whether a WFI CS or other form of preemption was performed.  This field must
-always be set to WFI for legacy reasons, and will eventually be deleted.
-
-
-#define NV_RAMIN                                                    /* ----G */
-
-// The instance block must be 4k-aligned.
-#define NV_RAMIN_BASE_SHIFT                                      12 /*       */
-
-// The instance block size fits within a single 4k block.
-#define NV_RAMIN_ALLOC_SIZE                                    4096 /*       */
-
-// Host State
-#define NV_RAMIN_RAMFC                         (127*32+31):(0*32+0) /* RWXUF */
-
-// Memory-Management State
-
-    The following fields are used for non-VEID engines.  The NV_RAMIN_SC_* described later
-    are used for VEID engines.
-
-    NV_RAMIN_PAGE_DIR_BASE_TARGET determines if the top level of the page tables
-    is in video memory or system memory (peer is not allowed), and the CPU cache
-    coherency for system memory.
-    Using INVALID, unbinds the selected engine.
-
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET               (128*32+1):(128*32+0) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_VID_MEM                  0x00000000 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_INVALID                  0x00000001 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT         0x00000002 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT      0x00000003 /* RW--V */
-
-    NV_RAMIN_PAGE_DIR_BASE_VOL identifies the volatile behavior
-    of top level of the page table (whether local L2 can cache it or not).
-
-#define NV_RAMIN_PAGE_DIR_BASE_VOL                  (128*32+2):(128*32+2) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_VOL_TRUE                        0x00000001 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_VOL_FALSE                       0x00000000 /* RW--V */
-
-
-    These bits specify whether the MMU will treats faults as replayable or not.
-    The engine will send these bits to the MMU as part of the instance bind.
-
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX     (128*32+4):(128*32+4) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED       0x00000000 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED        0x00000001 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC     (128*32+5):(128*32+5) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED       0x00000000 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED        0x00000001 /* RW--V */
-
-    NV_RAMIN_USE_NEW_PT_FORMAT determines which page table format to use.
-    When NV_RAMIN_USE_NEW_PT_FORMAT is false, the page table uses the old format.
-    When NV_RAMIN_USE_NEW_PT_FORMAT is true, the page table uses the new format.
-
-    Volta only supports the new format.  Selecting the old format results in an UNBOUND_INSTANCE fault.
-
-
-#define NV_RAMIN_USE_VER2_PT_FORMAT             (128*32+10):(128*32+10) /*       */
-#define NV_RAMIN_USE_VER2_PT_FORMAT_FALSE 0x00000000 /*       */
-#define NV_RAMIN_USE_VER2_PT_FORMAT_TRUE   0x00000001 /*       */
-
-    When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit TRUE, the bit selects the big page size.
-    When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit FALSE, NV_PFB_PRI_MMU_CTRL_VM_PG_SIZE selects the big page size.
-
-    Volta only supports 64KB for big pages.  Selecting 128KB for big pages results in an UNBOUND_INSTANCE fault.
-
-#define NV_RAMIN_BIG_PAGE_SIZE                    (128*32+11):(128*32+11) /* RWXUF */
-#define NV_RAMIN_BIG_PAGE_SIZE_128KB                           0x00000000 /* RW--V */
-#define NV_RAMIN_BIG_PAGE_SIZE_64KB                            0x00000001 /* RW--V */
-
-    NV_RAMIN_PAGE_DIR_BASE_LO and NV_RAMIN_PAGE_DIR_BASE_HI
-    identify the page directory base (start of the page table)
-    location for this context.
-
-#define NV_RAMIN_PAGE_DIR_BASE_LO                 (128*32+31):(128*32+12) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_HI                  (129*32+31):(129*32+0) /* RWXUF */
-
-// Single engine pointer channels cannot support multiple
-// engines with CTXSW pointers
-#define NV_RAMIN_ENGINE_CS                          (132*32+3):(132*32+3) /*       */
-#define NV_RAMIN_ENGINE_CS_WFI                                 0x00000000 /*       */
-#define NV_RAMIN_ENGINE_CS_FG                                  0x00000001 /*       */
-#define NV_RAMIN_ENGINE_WFI_TARGET                  (132*32+1):(132*32+0) /*       */
-#define NV_RAMIN_ENGINE_WFI_TARGET_LOCAL_MEM                   0x00000000 /*       */
-#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_COHERENT            0x00000002 /*       */
-#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_NONCOHERENT         0x00000003 /*       */
-#define NV_RAMIN_ENGINE_WFI_MODE                    (132*32+2):(132*32+2) /*       */
-#define NV_RAMIN_ENGINE_WFI_MODE_PHYSICAL                      0x00000000 /*       */
-#define NV_RAMIN_ENGINE_WFI_MODE_VIRTUAL                       0x00000001 /*       */
-#define NV_RAMIN_ENGINE_WFI_PTR_LO                (132*32+31):(132*32+12) /*       */
-#define NV_RAMIN_ENGINE_WFI_PTR_HI                  (133*32+7):(133*32+0) /*       */
-
-#define NV_RAMIN_ENGINE_WFI_VEID             (134*32+(6-1)):(134*32+0) /*       */
-#define NV_RAMIN_ENABLE_ATS                        (135*32+31):(135*32+31) /* RWXUF */
-#define NV_RAMIN_ENABLE_ATS_TRUE                                0x00000001 /* RW--V */
-#define NV_RAMIN_ENABLE_ATS_FALSE                               0x00000000 /* RW--V */
-#define NV_RAMIN_PASID                 (135*32+(20-1)):(135*32+0) /* RWXUF */
-
-
-     Pointer to a method buffer in BAR2 memory where a faulted engine can save
-out methods. BAR2 accesses are assumed to be virtual, so the address saved here
-is a virtual address.
-
-#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_LO                   (136*32+31):(136*32+0)  /* RWXUF */
-#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_HI                   (137*32+(((49-1)-32))):(137*32+0)  /* RWXUF */
-
-
-
-    These entries are used to inform FECS which of the below array of PDBs are
-    valid/filled in and need to subsequently be bound.
-
-    This needs to reserve at least NV_LITTER_NUM_SUBCTX entries.  Currently
-    there is enough space reserved for 64 subcontexts.
-#define NV_RAMIN_SC_PDB_VALID(i)             (166*32+i):(166*32+i) /* RWXUF */
-#define NV_RAMIN_SC_PDB_VALID__SIZE_1         64 /*       */
-#define NV_RAMIN_SC_PDB_VALID_FALSE                     0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PDB_VALID_TRUE                      0x00000001 /* RW--V */
-
-// Memory-Management VEID array
-
-    The NV_RAMIN_SC_PAGE_DIR_BASE_* entries are an array of page table settings
-    for each subcontext. When a context supports subcontexts, the page table
-    information for a given VEID/Subcontext needs to be filled in or else page
-    faults will result on access.
-
-    These properties for the page table must be filled in for all channels
-    sharing the same context as any channel's NV_RAMIN may be used to load the
-    context.
-
-    The non-subcontext page table information such as NV_RAMIN_PAGE_DIR_BASE*
-    are used by non-subcontext engines and clients such as Host, CE, or the
-    video engines.
-
-    NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) determines if the top level of the page tables
-    is in video memory or system memory (peer is not allowed), and the CPU cache
-    coherency for system memory.
-    Using INVALID, unbinds the selected subcontext.
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i)             ((168+(i)*4)*32+1):((168+(i)*4)*32+0) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET__SIZE_1                         64 /*       */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_VID_MEM                  0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_INVALID                  0x00000001 /* RW--V */ // Note: INVALID should match PEER
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT         0x00000002 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT      0x00000003 /* RW--V */
-
-    NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) identifies the volatile behavior
-    of the top level of the page table (whether local L2 can cache it or not).
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i)                  ((168+(i)*4)*32+2):((168+(i)*4)*32+2) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL__SIZE_1                         64 /*       */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_TRUE                        0x00000001 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_FALSE                       0x00000000 /* RW--V */
-
-    NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) and
-    NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) bits specify whether
-    the MMU will treats faults from TEX and GCC as replayable or
-    not. Based on that fault packets are written into replayable fault
-    buffer (or not) and faulting requests are put into replay request
-    buffer (or not).
-    The last bind that does not unbind a sub-context determines the REPLAY_TEX and REPLAY_GCC for all sub-contexts.
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i)     ((168+(i)*4)*32+4):((168+(i)*4)*32+4) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX__SIZE_1                         64 /*       */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED       0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED        0x00000001 /* RW--V */
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i)     ((168+(i)*4)*32+5):((168+(i)*4)*32+5) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC__SIZE_1                         64 /*       */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED       0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED        0x00000001 /* RW--V */
-
-    NV_RAMIN_SC_USE_VER2_PT_FORMAT determines which page table format to use.
-    When NV_RAMIN_SC_USE_VER2_PT_FORMAT is false, the page table uses
-    the old format(2-level page table). When
-    NV_RAMIN_SC_USE_VER2_PT_FORMAT is true, the page table uses the
-    new format (5-level 49-bit VA format).
-    The last bind that does not unbind a sub-context determines the page table format for all sub-contexts.
-    Volta only supports the new format.  Selecting the old format results in an UNBOUND_INSTANCE fault.
-
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT(i)          ((168+(i)*4)*32+10):((168+(i)*4)*32+10) /* RWXUF */
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT__SIZE_1                   64 /*       */
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_FALSE                       0x00000000 /* RW--V */
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_TRUE                        0x00000001 /* RW--V */
-
-    The last bind that does not unbind a sub-context determines the big page size for all sub-contexts.
-    Volta only supports 64KB for big pages.
-
-#define NV_RAMIN_SC_BIG_PAGE_SIZE(i)                    ((168+(i)*4)*32+11):((168+(i)*4)*32+11) /* RWXUF */
-#define NV_RAMIN_SC_BIG_PAGE_SIZE__SIZE_1                   64 /*       */
-#define NV_RAMIN_SC_BIG_PAGE_SIZE_64KB                            0x00000001 /* RW--V */
-
-    NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) and NV_RAMIN_SC_PAGE_DIR_BASE_HI(i)
-    identify the page directory base (start of the page table)
-    location for subcontext i.
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_LO(i)                ((168+(i)*4)*32+31):((168+(i)*4)*32+12) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_LO__SIZE_1                   64 /*       */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_HI(i)                 ((169+(i)*4)*32+31):((169+(i)*4)*32+0) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_HI__SIZE_1                   64 /*       */
-
-
-
-
-
-    NV_RAMIN_SC_ENABLE_ATS(i) tells whether subcontext i is ATS
-    enabled or not. In case, set to TRUE, GMMU will look for VA->PA
-    translations into both GMMU and ATS page tables.
-    ATS can be enabled or disabled per subcontext.
-
-#define NV_RAMIN_SC_ENABLE_ATS(i)                       ((170+(i)*4)*32+31):((170+(i)*4)*32+31) /* RWXUF */
-
-    NV_RAMIN_SC_PASID(i) identifies the PASID (process address space
-    ID) in CPU for subcontext i. PASID is used to get ATS
-    translation when ATS page table lookup is needed. During ATS TLB
-    shootdown, PASID is also used to match against the one coming with
-    shootdown request.
-
-#define NV_RAMIN_SC_PASID(i)                       ((170+(i)*4)*32+(20-1)):((170+(i)*4)*32+0) /* RWXUF */
-
-
-
-
-3  -  FIFO CONTEXT RAM (RAMFC)
-==============================
-
-
-     The NV_RAMFC part of a GPU-instance block contains Host's part of a virtual
-GPU's state.  Host is referred to as "FIFO". "FC" stands for FIFO Context.
-When Host switches from serving one GPU context to serving a second, Host saves
-state for the first GPU context to the first GPU context's RAMFC area, and loads
-state for the second GPU context from the second GPU context's RAMFC area.
-
-     RAMFC is located at NV_RAMIN_RAMFC within the GPU instance block.  In
-Kepler, this is at the start of the block.  RAMFC is 4KB aligned.
-
-     Every Host word entry in RAMFC directly corresponds to a PRI-accessible
-register.  For a description of the contents of a RAMFC entry, please see the
-description of the corresponding register in "manuals/dev_pbdma.ref".  The
-offsets of the fields within each entry in RAMFC match those of the
-corresponding register in the associated PBDMA unit's PRI space.
-
-
-    RAMFC Entry                     PBDMA Register
-    ------------------------------- ----------------------------------
-    NV_RAMFC_SIGNATURE               NV_PPBDMA_SIGNATURE(i)
-    NV_RAMFC_GP_BASE                 NV_PPBDMA_GP_BASE(i)
-    NV_RAMFC_GP_BASE_HI              NV_PPBDMA_GP_BASE_HI(i)
-    NV_RAMFC_GP_FETCH                NV_PPBDMA_GP_FETCH(i)
-    NV_RAMFC_GP_GET                  NV_PPBDMA_GP_GET(i)
-    NV_RAMFC_GP_PUT                  NV_PPBDMA_GP_PUT(i)
-    NV_RAMFC_PB_FETCH                NV_PPBDMA_PB_FETCH(i)
-    NV_RAMFC_PB_FETCH_HI             NV_PPBDMA_PB_FETCH_HI(i)
-    NV_RAMFC_PB_GET                  NV_PPBDMA_GET(i)
-    NV_RAMFC_PB_GET_HI               NV_PPBDMA_GET_HI(i)
-    NV_RAMFC_PB_PUT                  NV_PPBDMA_PUT(i)
-    NV_RAMFC_PB_PUT_HI               NV_PPBDMA_PUT_HI(i)
-    NV_RAMFC_PB_TOP_LEVEL_GET        NV_PPBDMA_TOP_LEVEL_GET(i)
-    NV_RAMFC_PB_TOP_LEVEL_GET_HI     NV_PPBDMA_TOP_LEVEL_GET_HI(i)
-    NV_RAMFC_GP_CRC                  NV_PPBDMA_GP_CRC(i)
-    NV_RAMFC_PB_HEADER               NV_PPBDMA_PB_HEADER(i)
-    NV_RAMFC_PB_COUNT                NV_PPBDMA_PB_COUNT(i)
-    NV_RAMFC_PB_CRC                  NV_PPBDMA_PB_CRC(i)
-    NV_RAMFC_SUBDEVICE               NV_PPBDMA_SUBDEVICE(i)
-    NV_RAMFC_METHOD0                 NV_PPBDMA_METHOD0(i)
-    NV_RAMFC_METHOD1                 NV_PPBDMA_METHOD1(i)
-    NV_RAMFC_METHOD2                 NV_PPBDMA_METHOD2(i)
-    NV_RAMFC_METHOD3                 NV_PPBDMA_METHOD3(i)
-    NV_RAMFC_DATA0                   NV_PPBDMA_DATA0(i)
-    NV_RAMFC_DATA1                   NV_PPBDMA_DATA1(i)
-    NV_RAMFC_DATA2                   NV_PPBDMA_DATA2(i)
-    NV_RAMFC_DATA3                   NV_PPBDMA_DATA3(i)
-    NV_RAMFC_TARGET                  NV_PPBDMA_TARGET(i)
-    NV_RAMFC_METHOD_CRC              NV_PPBDMA_METHOD_CRC(i)
-    NV_RAMFC_REF                     NV_PPBDMA_REF(i)
-    NV_RAMFC_RUNTIME                 NV_PPBDMA_RUNTIME(i)
-    NV_RAMFC_SEM_ADDR_LO             NV_PPBDMA_SEM_ADDR_LO(i)
-    NV_RAMFC_SEM_ADDR_HI             NV_PPBDMA_SEM_ADDR_HI(i)
-    NV_RAMFC_SEM_PAYLOAD_LO          NV_PPBDMA_SEM_PAYLOAD_LO(i)
-    NV_RAMFC_SEM_PAYLOAD_HI          NV_PPBDMA_SEM_PAYLOAD_HI(i)
-    NV_RAMFC_SEM_EXECUTE             NV_PPBDMA_SEM_EXECUTE(i)
-    NV_RAMFC_ACQUIRE_DEADLINE        NV_PPBDMA_ACQUIRE_DEADLINE(i)
-    NV_RAMFC_ACQUIRE                 NV_PPBDMA_ACQUIRE(i)
-    NV_RAMFC_MEM_OP_A                NV_PPBDMA_MEM_OP_A(i)
-    NV_RAMFC_MEM_OP_B                NV_PPBDMA_MEM_OP_B(i)
-    NV_RAMFC_MEM_OP_C                NV_PPBDMA_MEM_OP_C(i)
-    NV_RAMFC_USERD                   NV_PPBDMA_USERD(i)
-    NV_RAMFC_USERD_HI                NV_PPBDMA_USERD_HI(i)
-    NV_RAMFC_HCE_CTRL                NV_PPBDMA_HCE_CTRL(i)
-    NV_RAMFC_CONFIG                  NV_PPBDMA_CONFIG(i)
-    NV_RAMFC_SET_CHANNEL_INFO        NV_PPBDMA_SET_CHANNEL_INFO(i)
-    ------------------------------- ----------------------------------
-
-#define NV_RAMFC                                                    /* ----G */
-#define NV_RAMFC_GP_PUT                          (0*32+31):(0*32+0) /* RWXUF */
-#define NV_RAMFC_MEM_OP_A                        (1*32+31):(1*32+0) /* RWXUF */
-#define NV_RAMFC_USERD                           (2*32+31):(2*32+0) /* RWXUF */
-#define NV_RAMFC_USERD_HI                        (3*32+31):(3*32+0) /* RWXUF */
-#define NV_RAMFC_SIGNATURE                       (4*32+31):(4*32+0) /* RWXUF */
-#define NV_RAMFC_GP_GET                          (5*32+31):(5*32+0) /* RWXUF */
-#define NV_RAMFC_PB_GET                          (6*32+31):(6*32+0) /* RWXUF */
-#define NV_RAMFC_PB_GET_HI                       (7*32+31):(7*32+0) /* RWXUF */
-#define NV_RAMFC_PB_TOP_LEVEL_GET                (8*32+31):(8*32+0) /* RWXUF */
-#define NV_RAMFC_PB_TOP_LEVEL_GET_HI             (9*32+31):(9*32+0) /* RWXUF */
-#define NV_RAMFC_REF                           (10*32+31):(10*32+0) /* RWXUF */
-#define NV_RAMFC_RUNTIME                       (11*32+31):(11*32+0) /* RWXUF */
-#define NV_RAMFC_ACQUIRE                       (12*32+31):(12*32+0) /* RWXUF */
-#define NV_RAMFC_ACQUIRE_DEADLINE              (13*32+31):(13*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_ADDR_HI                   (14*32+31):(14*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_ADDR_LO                   (15*32+31):(15*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_PAYLOAD_LO                (16*32+31):(16*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_EXECUTE                   (17*32+31):(17*32+0) /* RWXUF */
-#define NV_RAMFC_GP_BASE                       (18*32+31):(18*32+0) /* RWXUF */
-#define NV_RAMFC_GP_BASE_HI                    (19*32+31):(19*32+0) /* RWXUF */
-#define NV_RAMFC_GP_FETCH                      (20*32+31):(20*32+0) /* RWXUF */
-#define NV_RAMFC_PB_FETCH                      (21*32+31):(21*32+0) /* RWXUF */
-#define NV_RAMFC_PB_FETCH_HI                   (22*32+31):(22*32+0) /* RWXUF */
-#define NV_RAMFC_PB_PUT                        (23*32+31):(23*32+0) /* RWXUF */
-#define NV_RAMFC_PB_PUT_HI                     (24*32+31):(24*32+0) /* RWXUF */
-#define NV_RAMFC_MEM_OP_B                      (25*32+31):(25*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED26                    (26*32+31):(26*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED27                    (27*32+31):(27*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED28                    (28*32+31):(28*32+0) /* RWXUF */
-#define NV_RAMFC_GP_CRC                        (29*32+31):(29*32+0) /* RWXUF */
-#define NV_RAMFC_PB_HEADER                     (33*32+31):(33*32+0) /* RWXUF */
-#define NV_RAMFC_PB_COUNT                      (34*32+31):(34*32+0) /* RWXUF */
-#define NV_RAMFC_SUBDEVICE                     (37*32+31):(37*32+0) /* RWXUF */
-#define NV_RAMFC_PB_CRC                        (38*32+31):(38*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_PAYLOAD_HI                (39*32+31):(39*32+0) /* RWXUF */
-#define NV_RAMFC_MEM_OP_C                      (40*32+31):(40*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED20                    (41*32+31):(41*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED21                    (42*32+31):(42*32+0) /* RWXUF */
-#define NV_RAMFC_TARGET                        (43*32+31):(43*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD_CRC                    (44*32+31):(44*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD0                       (48*32+31):(48*32+0) /* RWXUF */
-#define NV_RAMFC_DATA0                         (49*32+31):(49*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD1                       (50*32+31):(50*32+0) /* RWXUF */
-#define NV_RAMFC_DATA1                         (51*32+31):(51*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD2                       (52*32+31):(52*32+0) /* RWXUF */
-#define NV_RAMFC_DATA2                         (53*32+31):(53*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD3                       (54*32+31):(54*32+0) /* RWXUF */
-#define NV_RAMFC_DATA3                         (55*32+31):(55*32+0) /* RWXUF */
-#define NV_RAMFC_HCE_CTRL                      (57*32+31):(57*32+0) /* RWXUF */
-#define NV_RAMFC_CONFIG                        (61*32+31):(61*32+0) /* RWXUF */
-#define NV_RAMFC_SET_CHANNEL_INFO              (63*32+31):(63*32+0) /* RWXUF */
-
-#define NV_RAMFC_BASE_SHIFT                                      12 /*       */
-
-    Size of the full range of RAMFC in bytes.
-#define NV_RAMFC_SIZE_VAL                                0x00000200 /* ----C */
-
-4 - USER-DRIVER ACCESSIBLE RAM (RAMUSERD)
-=========================================
-
-     A user-level driver is allowed to access only a small portion of a GPU
-context's state.  The portion of a GPU context's state that a user-level driver
-can access is stored in a block of memory called NV_RAMUSERD.  NV_RAMUSERD is a
-user-level driver's window into NV_RAMFC.  The NV_RAMUSERD state for each GPU
-context is stored in an aligned NV_RAMUSERD_CHAN_SIZE-byte block of memory.
-
-     To submit more methods, a user driver writes a PB segment to
-memory, writes a GP entry that points to the PB segment, updates GP_PUT in
-RAMUSERD, and writes the channel's handle to the
-NV_USERMODE_NOTIFY_CHANNEL_PENDING register (see dev_usermode.ref).
-
-     The RAMUSERD data structure is updated at regular intervals as controlled
-by the NV_PFIFO_USERD_WRITEBACK setting (see dev_fifo.ref). For a particular
-channel, RAMUSERD writeback can be disabled and it is reccomended that SW track
-pushbuffer and channel progress via Host WFI_DIS semaphores rather than reading
-the RAMUSERD data structure.
-
-     When write-back is enabled a user driver can check the GPU progress in
-executing a channel's PB segments. The driver can use:
-    * GP_GET to monitor the index of the next GP entry the GPU will process
-    * PB_GET to monitor the address of the next PB entry the GPU will process
-    * TOP_LEVEL_GET (see NV_PPBDMA_TOP_LEVEL_GET) to monitor the address of the
-      next "top-level" (non-SUBROUTINE) PB entry the GPU will process
-    * REF to monitor the current "reference count" value see NV_PPBDMA_REF.
-
-     Each entry in RAMUSERD corresponds to a PRI-accessible PBDMA register in Host.
-For a description of the behavior and contents of a RAMUSERD entry, please see
-the description of the corresponding register in "manuals/dev_pbdma.ref".
-
-    RAMUSERD Entry                   PBDMA Register                 Access
-    -------------------------------  -----------------------------  ----------
-    NV_RAMUSERD_GP_PUT               NV_PPBDMA_GP_PUT(i)            Read/Write
-    NV_RAMUSERD_GP_GET               NV_PPBDMA_GP_GET(i)            Read-only
-    NV_RAMUSERD_GET                  NV_PPBDMA_GET(i)               Read-only
-    NV_RAMUSERD_GET_HI               NV_PPBDMA_GET_HI(i)            Read-only
-    NV_RAMUSERD_PUT                  NV_PPBDMA_PUT(i)               Read-only
-    NV_RAMUSERD_PUT_HI               NV_PPBDMA_PUT_HI(i)            Read-only
-    NV_RAMUSERD_TOP_LEVEL_GET        NV_PPBDMA_TOP_LEVEL_GET(i)     Read-only
-    NV_RAMUSERD_TOP_LEVEL_GET_HI     NV_PPBDMA_TOP_LEVEL_GET_HI(i)  Read-only
-    NV_RAMUSERD_REF                  NV_PPBDMA_REF(i)               Read-only
-    -------------------------------  -----------------------------  ----------
-
-     A user driver may write to NV_RAMUSERD_GP_PUT to kick off more work in a
-channel.  Although writes to the other, read-only, entries can alter memory,
-writes to those entries will not affect the operation of the GPU, and can be
-overwritten by the GPU.
-
-     When Host loads its part of a GPU context's state from RAMFC memory, it
-may not immediately read RAMUSERD_GP_PUT.  Host can use the GP_PUT values from
-RAMFC directly from RAMFC while waiting for the RAMUSERD_GP_PUT to synchronize.
-Because reads of RAMUSERD_GP_PUT can be delayed, the value in NV_PPBDMA_GP_PUT
-can be older than the value in NV_RAMUSERD_GP_PUT.
-
-     When Host saves a GPU context's state to NV_RAMFC, it also writes to
-NV_RAMUSERD the values of the entries other than GP_PUT.
-Because Host does not continuously write the read-only RAMFC entries, the
-read-only values in USERD memory can be older than the values in the Host PBDMA
-unit.
-
-#define NV_RAMUSERD                                                 /* ----G */
-#define NV_RAMUSERD_PUT                        (16*32+31):(16*32+0) /* RWXUF */
-#define NV_RAMUSERD_GET                        (17*32+31):(17*32+0) /* RWXUF */
-#define NV_RAMUSERD_REF                        (18*32+31):(18*32+0) /* RWXUF */
-#define NV_RAMUSERD_PUT_HI                     (19*32+31):(19*32+0) /* RWXUF */
-#define NV_RAMUSERD_TOP_LEVEL_GET              (22*32+31):(22*32+0) /* RWXUF */
-#define NV_RAMUSERD_TOP_LEVEL_GET_HI           (23*32+31):(23*32+0) /* RWXUF */
-#define NV_RAMUSERD_GET_HI                     (24*32+31):(24*32+0) /* RWXUF */
-#define NV_RAMUSERD_GP_GET                     (34*32+31):(34*32+0) /* RWXUF */
-#define NV_RAMUSERD_GP_PUT                     (35*32+31):(35*32+0) /* RWXUF */
-#define NV_RAMUSERD_BASE_SHIFT             9 /*       */
-#define NV_RAMUSERD_CHAN_SIZE               512 /*       */
-
-
-
-
-5 - RUN-LIST RAM (RAMRL)
-========================
-
-     Software specifies the GPU contexts that hardware should "run" by writing a
-list of entries (known as a "runlist") to a 4k-aligned area of memory (beginning
-at NV_PFIFO_RUNLIST_BASE), and by notifying Host that a new list is available
-(by writing to NV_PFIFO_RUNLIST).
-     Submission of a new runlist causes Host to expire the timeslice of all work
-scheduled by the previous runlist, allowing it to schedule the channels present
-in the new runlist once they are fetched. SW can check the status of the runlist
-by polling NV_PFIFO_ENG_RUNLIST_PENDING. (see dev_fifo.ref NV_PFIFO_RUNLIST for
-a full description of the runlist submit mechanism).
-     Runlists can be stored in system memory or video memory (as specified by
-NV_PFIFO_RUNLIST_BASE_TARGET). If a runlist is stored in video memory, software
-will have to execute flush or read the last entry written before submitting the
-runlist to Host to guarantee coherency .
-     The size of a runlist entry data structure is 16 bytes. Each entry
-specifies either a channel entry or a TSG header; the type is determined by the
-NV_RAMRL_ENTRY_TYPE.
-
-
-Runlist Channel Entry Type:
-
-     A runlist entry of type NV_RAMRL_ENTRY_TYPE_CHAN specifies a channel to
-run.  All such entries must occur within the span of some TSG as specified by
-the NV_RAMRL_ENTRY_TYPE_TSG described below.  If a channel entry is encountered
-outside a TSG, Host will raise the NV_PFIFO_INTR_SCHED_ERROR_CODE_BAD_TSG
-interrupt.
-
-     The fields available in a channel runlist entry are as follows (Fig 5.1):
-
-  ENTRY_TYPE (T)        : type of this entry: ENTRY_TYPE_CHAN
-  CHID (ID)             : identifier of the channel to run (overlays ENTRY_ID)
-  RUNQUEUE_SELECTOR (Q) : selects which PBDMA should run this channel if
-                          more than one PBDMA is supported by the runlist
-
-  INST_PTR_LO           : lower 20 bits of the 4k-aligned instance block pointer
-  INST_PTR_HI           : upper 32 bit of instance block pointer
-  INST_TARGET (TGI)     : aperture of the instance block
-
-  USERD_PTR_LO          : upper 24 bits of the low 32 bits, of the 512-byte-aligned USERD pointer
-  USERD_PTR_HI          : upper 32 bits of USERD pointer
-  USERD_TARGET (TGU)    : aperture of the USERD data structure
-
-     CHID is a channel identifier that uniquely specifies the channel described
-by this runlist entry to the scheduling hardware and is reported in various
-status registers.
-     RUNQUEUE_SELECTOR determines to which runqueue the channel belongs, and
-thereby which PBDMA will run the channel.  Increasing values select increasingly
-numbered PBDMA IDs serving the runlist.  If the selector value exceeds the
-number of PBDMAs on the runlist, the hardware will silently reassign the channel
-to run on the first PBDMA as though RUNQUEUE_SELECTOR had been set to 0.  (In
-current hardware, this is used by SCG on the graphics runlist only to determine
-which FE pipe should service a given channel.  A value of 0 targets the first FE
-pipe, which can process all FE driven engines: Graphics, Compute, Inline2Memory,
-and TwoD.  A value of 1 targets the second FE pipe, which can only process
-Compute work.  Note that GRCE work is allowed on either runqueue.)
-     The INST fields specify the physical address of the channel's instance
-block, the in-memory data structure that stores the context state.
-The target aperture of the instance block is given by INST_TARGET, and the byte
-offset within that aperture is calculated as
-
- (INST_PTR_HI << 32) | (INST_PTR_LO  << NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT)
-
-This address should match the one specified in the channel RAM's
-NV_PCCSR_CHANNEL_INST register; see NV_RAMIN and NV_RAMFC for the format of the
-instance block.  The hardware ignores the RAMRL INST fields, but in future
-chips the instance pointer may be removed from the channel RAM and the RAMRL
-INST fields used instead, resulting in smaller hardware.
-     The USERD fields specify the physical address of the USERD memory region
-used by software to submit additional work to the channel.  The target aperture
-of the USERD region is given by USERD_TARGET, and the byte offset within that
-aperture is calculated as
-
- (USERD_PTR_HI << 32) | (USERD_PTR_LO  << NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT)
-
-
-SW uses the NV_RAMUSERD_CHAN_SIZE define to allocate and align a channel's
-RAMUSERD data structure.  See the documentation for NV_RAMUSERD for a
-description of the use of USERD and its format.  This address and it's
-alignment must match the one specified in the RAMFC's NV_RAMFC_USERD and
-NV_RAMFC_USERD_HI fields which are backed by NV_PPBDMA_USERD in dev_pbdma.ref.
-The hardware ignores the RAMRL USERD fields, but in future chips the USERD
-pointer may be read from these fields in the runlist entry instead of the RAMFC
-to avoid the extra level of indirection in fetching the USERD data that
-currently results in a dependent read.
-
-
-Runlist TSG Entry Type:
-
-     The other type of runlist entry is Timeslice Group (TSG) header entry
-(Fig 5.2). This type of entry is specified by NV_RAMRL_ENTRY_TYPE_TSG. A TSG
-entry describes a collection of channels all of which share the same context and
-are scheduled as a single unit by Host. All runlists support this type of entry.
-
-     The fields available in a TSG header runlist entry are as follows (Fig 5.2):
-
-  ENTRY_TYPE (T)      : type of this entry: ENTRY_TYPE_TSG
-  TSGID               : identifier of the Timeslice group (overlays ENTRY_ID)
-  TSG_LENGTH          : number of channels that are part of this timeslice group
-  TIMESLICE_SCALE     : scale factor for the TSG's timeslice
-  TIMESLICE_TIMEOUT   : timeout amount for the TSG's timeslice
-
-     A timeslice group entry consists of an integer identifier along with a
-length which specifies the number of channels in the TSG. After a TSG header
-runlist entry, the next TSG_LENGTH runlist entries are considered to be part of
-the timeslice group.  Note that the minimum length of a TSG is at least one entry.
-     All channels in a TSG share the same runlist timeslice which specifies how
-long a single context runs on an engine or PBDMA before being swapped for a
-different context. The timeslice period is set in the TSG header by specifying
-TSG_TIMESLICE_TIMEOUT and TSG_TIMESLICE_SCALE. The TSG timeslice period is
-calculated as follows:
-
-  timeslice = (TSG_TIMESLICE_TIMEOUT << TSG_TIMESLICE_SCALE) * 1024 nanoseconds
-
-     The timeslice period should normally not be set to zero.  A timeslice of
-zero will be treated as a timeslice period of one . The runlist
-timeslice period begins after the context has been loaded on a PBDMA but is
-paused while the channel has an outstanding context load to an engine.  Time
-spent switching a context into an engine is not part of the runlist timeslice.
-
-     If Host reaches the end of the runlist or receives another entry of type
-NV_RAMRL_ENTRY_TYPE_TSG before processing TSG_LENGTH additional runlist entries,
-or if it encounters a TSG of length 0, a SCHED_ERROR interrupt will be generated
-with ERROR_CODE_BAD_TSG.
-
-
-Host Scheduling Memory Layout:
-
-Example of graphics runlist entry to GPU context mapping via channel id:
-
-
-                           .------Ints_ptr -------.
-                           |                      |
-     Graphics Runlist      |    Channel-Map RAM   |          GPU Instance Block
-     .------------ .       |  .----------------.  |        .-------------------.
-     | TSG Hdr L=m |--.----'  |Ch0 Inst Blk Ptr|--'------->| Host State        |
-     | RL Entry T1 |  |       |Ch1 Inst Blk Ptr|    .------| Memory State      |
-     | RL Entry T2 |  |       |       ...      |    |      | Engine0 State Ptr |
-     |    ...      |  |-chid->|ChI Inst Blk Ptr|    |      | Engine1 State Ptr |
-     | RL Entry Tm |  |       |       ...      |    |      |     ...           |
-     | TSG Hdr L=n |  |       |ChN Inst Blk Ptr|    |    .-| EngineN State Ptr |
-     | RL Entry T1 |  |       `----------------'    |    | `-------------------'
-     | RL Entry T2 |userd_ptr                       |    |
-     |    ...      |  |        .--------------.     |    |   .--------------.
-     | RL Entry Tn |  |        |    USERD     |     |    |   |  Engine Ctx  |
-     |             |  '------->|              |<----'    '-->|    State N   |
-     `-------------'           |              |              |              |
-                               `--------------'              `--------------'
-
-Runlist Diagram Description:
-    Here we have (M+N) number of channel type (ENTRY_TYPE_CHID) runlist entries
-grouped together within two TSGs. The first entry in the runlist is a TSG header
-entry (ENTRY_TYPE_TSG) that describes the first TSG. The TSG header specifies m
-as the length of the TSG. The header would also contain the timeslice
-information for the TSG (SCALE/TIMEOUT), as well as the TSG id specified in the
-TSGID field.
-    Because the length here is M, the Runlist *must* contain M additional
-runlist entries of type ENTRY_TYPE_CHAN that will be part of this TSG.
-Similarly, the next (N+1) number of entries, a TSG header entry followed by N
-number of regular channel entry, correspond to the second TSG.
-
-#define NV_RAMRL_ENTRY                                               /* ----G */
-#define NV_RAMRL_ENTRY_RANGE                          0xF:0x00000000 /* RW--M */
-#define NV_RAMRL_ENTRY_SIZE                                       16 /*       */
-// Runlist base must be 4k-aligned.
-#define NV_RAMRL_ENTRY_BASE_SHIFT                                 12 /*       */
-
-
-#define NV_RAMRL_ENTRY_TYPE                        (0+0*32):(0+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TYPE_CHAN                          0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_TYPE_TSG                           0x00000001 /* RW--V */
-
-#define NV_RAMRL_ENTRY_ID                         (11+2*32):(0+2*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_ID_HW                      11:0 /* RWXUF */
-#define NV_RAMRL_ENTRY_ID_MAX              (4096-1) /* RW--V */
-
-
-
-
-
-#define NV_RAMRL_ENTRY_CHAN_RUNQUEUE_SELECTOR      (1+0*32):(1+0*32) /* RWXUF */
-
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET                   (5+0*32):(4+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_VID_MEM                  0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_COHERENT         0x00000002 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_NONCOHERENT      0x00000003 /* RW--V */
-
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET                  (7+0*32):(6+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM                 0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM_NVLINK_COHERENT 0x00000001 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_COHERENT        0x00000002 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_NONCOHERENT     0x00000003 /* RW--V */
-
-#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_LO          (31+0*32):(8+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_HI          (31+1*32):(0+1*32) /* RWXUF */
-
-#define NV_RAMRL_ENTRY_CHAN_CHID                  (11+2*32):(0+2*32) /* RWXUF */
-
-#define NV_RAMRL_ENTRY_CHAN_INST_PTR_LO          (31+2*32):(12+2*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_INST_PTR_HI           (31+3*32):(0+3*32) /* RWXUF */
-
-
-
-// Macros for shifting out low bits of INST_PTR and USERD_PTR.
-#define NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT                  12 /* ----C */
-#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT                  8 /* ----C */
-
-
-
-
-
-
-
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE       (19+0*32):(16+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE_3              0x00000003 /* RWI-V */
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT     (31+0*32):(24+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_128          0x00000080 /* RWI-V */
-
-
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_1US          0x00000000 /*       */
-
-#define NV_RAMRL_ENTRY_TSG_LENGTH                  (7+1*32):(0+1*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TSG_LENGTH_INIT                    0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_TSG_LENGTH_MIN                     0x00000001 /* RW--V */
-#define NV_RAMRL_ENTRY_TSG_LENGTH_MAX                     0x00000080 /* RW--V */
-
-#define NV_RAMRL_ENTRY_TSG_TSGID                  (11+2*32):(0+2*32) /* RWXUF */
-
-
-
-6  -  Host Pushbuffer Format (FIFO_DMA)
-=======================================
-
-     "FIFO" refers to Host.  "FIFO_DMA" means data that Host reads from memory:
-the pushbuffer.  Host autonomously reads pushbuffer data from memory and
-generates method address/data pairs from the data.
-
-     Pushbuffer terminology:
-
-     - A channel is the logical sequence of instructions associated with a GPU
-       context.
-
-     - The pushbuffer is a stream of data in memory containing the
-       specifications of the operations that a channel is to perform for a
-       particular client.  Pushbuffer data consists of pushbuffer entries.
-
-     - A pushbuffer entry (PB entry) is a 32-bit (doubleword) sized unit of
-       pushbuffer data.  This is the smallest granularity at which Host consumes
-       pushbuffer data.  A PB entry is either a PB instruction (which is either
-       a PB control entry or a PB method header), or a method data entry.
-
-     - A pushbuffer segment (PB segment) is a contiguous block of memory
-       containing pushbuffer entries.  The location and size of a pushbuffer
-       segment is defined by its respective GP entry in the GPFIFO.
-
-     - A pushbuffer control entry (PB control entry) is a single PB entry of
-       type SET_SUBDEVICE_MASK, STORE_SUBDEVICE_MASK, USE_SUBDEVICE_MASK,
-       END_PB_SEGMENT, or a universal NOP (NV_FIFO_DMA_NOP).
-
-     - A pushbuffer compressed method sequence is a sequence of pushbuffer
-       entries starting with a method header and a variable-length sequence of
-       method data entries (the length being defined by the method header).  A
-       single PB compressed method sequence expands into one or more methods.
-       This may also be known as a "pushbuffer method" (PB method), but that
-       terminology is ambiguous and not preferred.
-
-     - A pushbuffer method header (PB method header) is the first PB entry found
-       in a PB compressed method sequence.  A PB method header is a PB
-       instruction performed on method data entries.
-
-     - A pushbuffer instruction (PB instruction) is a PB entry that is not a PB
-       method data entry.  A PB instruction is either a PB control entry or a PB
-       method header.
-
-     - A method is an address/data pair representing an operation to perform.
-
-     - A method data entry is the 32-bit operand for its corresponding method.
-
-
-
-#define NV_FIFO_PB_ENTRY_SIZE                                     4 /*       */
-
-
-     Some engines such as Graphics internally support a double-wide method FIFO;
-these are known as "data-hi" methods.  It is Host that performs the packing of
-two methods into one double-wide entry.  Host will only generate data-hi methods
-if the following conditions are satisfied:
-
-     1. The two methods come from the same PB method (in other words they share
-        the same method header).
-
-     2. The method header specifies a non-incrementing method, an incrementing
-        method, or an increment-once method.
-
-     3. The paired methods either have the same method address, or the first
-        method has an even NV_FIFO_DMA_METHOD_ADDRESS field and the second
-        (data-hi) method is the increment of the first.  (That is, the
-        left-shifted method address as listed in the class files must be
-        divisible by 8 for this condition to hold.)
-
-     4. The second method is available at the time of pushing the first one into
-        the engine's method FIFO. In other words, Host will not wait to pack
-        methods.  Note that if the engine's method fifo is full, the
-        back-pressure will in itself create a "wait time".
-
-The first three conditions are under SW's control.  Only the graphics engine
-supports data-hi methods.
-
-
-Types of PB Entries
-
-     PB entries can be classified into three types: PB method headers, PB
-control entries, and PB method data.  Different types of PB entries have
-different formats.  Because PB compressed method sequences are of variable
-length, it is impossible to determine the type of a PB entry without tracking
-the pushbuffer from the beginning or from the location of a PB entry that is
-known to not be a PB method data entry.
-
-     A PB method data entry is always found in a method data sequence
-immediately following a PB method header in the logical stream of PB entries.
-The PB method header contains a NV_FIFO_DMA_METHOD_COUNT field, the value of
-which is equal to the length of the method data sequence.  Note a PB method
-header does not necessarily come with PB method data entries (see details below
-about immediate-data method headers and method headers for which COUNT is zero).
-Also note the PB method data entries may be located in a PB segment separate
-from their corresponding method header.  The format of any given PB method data
-entry is defined in the "NV_UDMA" section of dev_pbdma.ref.
-
-     A PB entry that is either a PB method header or PB control entry is known
-as a PB instruction.  The type of a PB instruction is specified by the
-NV_FIFO_DMA_SEC_OP field and the NV_FIFO_DMA_TERT_OP field.
-
-   secondary  tertiary
-    opcode     opcode   entry type
-   ---------  --------  --------------------------------
-      000        01     SET_SUBDEVICE_MASK
-      000        10     STORE_SUBDEVICE_MASK
-      000        11     USE_SUBDEVICE_MASK
-      001        xx     incrementing method header
-      011        xx     non-incrementing method header
-      100        xx     immediate-data method header
-      101        xx     increment-once method header
-      111        xx     END_PB_SEGMENT
-   ---------  --------  --------------------------------
-
-     Types of methods:
-
-     - A Host method is a method whose address is defined in the NV_UDMA device
-       range.
-
-     - A Host-only method is any Host method excluding SetObject (also known as
-       NV_UDMA_OBJECT).
-
-     - An engine method is a method whose address is not defined within the
-       NV_UDMA device range.  There are multiple engines designated by a
-       subchannel ID.  Software methods are included in this category.
-
-     - A software method (SW method) is a method which causes an interrupt for
-       the express purpose of being handled by software.  For details see the
-       section on software methods below.
-
-     For more information about types of methods see "HOST METHODS" and
-"RESERVED METHOD ADDRESSES" in dev_pbdma.ref.
-
-     The method address in a PB method header (stored in the
-NV_FIFO_DMA_METHOD_ADDRESS field) is a dword-address, not a byte-address.  In
-other words the least significant two bits of the address are not stored because
-the byte-address is dword-aligned (thus the least significant two bits are
-always zero).
-
-     The subchannel in a PB method header (stored in the
-NV_FIFO_DMA_*_SUBCHANNEL field) determines the engine to which a method will be
-sent if the method is SetObject or an engine method (otherwise, the SUBCHANNEL
-field is ignored).  SetObject enables SW to request HW to check the expectation
-that a given subchannel serves the specified class ID; see the description of
-"NV_UDMA_OBJECT" in dev_pbdma.ref.
-
-     The mapping between subchannels and engines is fixed.  A subchannel is
-bound to a given class according to the runlist.  Each engine method is applied
-to an "object," which itself is an instance of an NV class as defined by the
-master MFS class files.  Each object belongs to an engine.  For SetObject and
-engine methods, the engine is determined entirely by the SUBCHANNEL field of
-the method's header via a fixed mapping that depends on the runlist on which the
-method arrives.
-
-     Methods on subchannels 0-4 are handled by the primary engine served by the
-runlist, except that subchannel 4 targets GRCOPY0 and GRCOPY1 on the graphics
-runlist.  For Graphics/Compute, SetObject associates subchannels 0, 1, 2, and 3
-with class identifiers for 3D, compute, I2M, and 2D respectively.  On other
-runlists, the subchannel is ignored, and Host does not send the subchannel ID to
-the engine.  It is recommended that SW only use subchannel 4 on the dedicated
-copy engines for consistency with GRCOPY usage.
-
-     Subchannels 5-7 are for software methods.  Any methods on these subchannels
-(including SetObject methods) are kicked back to software for handling via the
-SW method dispatch mechanism using the NV_PPBDMA_INTR_*_DEVICE interrupt.  SW
-may choose to send a SetObject method to each engine subchannel before sending
-any methods on that particular subchannel in order to support multiple software
-classes.
-
-     If a method stream subchannel-switches from targeting graphics/compute to a
-copy engine or vice-versa, that is, to or from subchannel 4 on GR, Host will:
-
-     1. Wait until the first engine has completed all its methods,
-
-     2. Wait until that engine indicates that it is idle (WFI), and
-
-     3. Send a sysmem barrier flush and wait until it completes.
-
-Only then will Host send methods to the newly targeted engine.
-
-     Note that this WFI will not occur for sending Host-only methods on the new
-subchannel, since Host-only methods ignore the subchannel field.  Additionally,
-when switching from CE to graphics/compute, Host forces FE to perform a cache
-invalidate.  Other subchannel switch semantics may be provided by the engines
-themselves, such as switching between subchannels 0-3 within FE.
-
-
-#define NV_FIFO_DMA                                                 /* ----G */
-#define NV_FIFO_DMA_METHOD_ADDRESS_OLD                         12:2 /* RWXUF */
-#define NV_FIFO_DMA_METHOD_ADDRESS                             11:0 /* RWXUF */
-
-#define NV_FIFO_DMA_SUBDEVICE_MASK                             15:4 /* RWXUF */
-
-#define NV_FIFO_DMA_METHOD_SUBCHANNEL                         15:13 /* RWXUF */
-
-#define NV_FIFO_DMA_TERT_OP                                   17:16 /* RWXUF */
-#define NV_FIFO_DMA_TERT_OP_GRP0_SET_SUB_DEV_MASK        0x00000001 /* RW--V */
-#define NV_FIFO_DMA_TERT_OP_GRP0_STORE_SUB_DEV_MASK      0x00000002 /* RW--V */
-#define NV_FIFO_DMA_TERT_OP_GRP0_USE_SUB_DEV_MASK        0x00000003 /* RW--V */
-
-#define NV_FIFO_DMA_METHOD_COUNT_OLD                          28:18 /* RWXUF */
-#define NV_FIFO_DMA_METHOD_COUNT                              28:16 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_DATA                                 28:16 /* RWXUF */
-
-#define NV_FIFO_DMA_SEC_OP                                    31:29 /* RWXUF */
-#define NV_FIFO_DMA_SEC_OP_GRP0_USE_TERT                 0x00000000 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_INC_METHOD                    0x00000001 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_NON_INC_METHOD                0x00000003 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_IMMD_DATA_METHOD              0x00000004 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_ONE_INC                       0x00000005 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_RESERVED6                     0x00000006 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_END_PB_SEGMENT                0x00000007 /* RW--V */
-
-
-Incrementing PB Method Header Format
-
-     An incrementing PB method header specifies that Host generate a sequence of
-methods.  The length of the sequence is defined by the method header.  The
-method data for each method in this sequence is found in a sequence of PB
-entries immediately following the method header.
-
-     The dword-address of the first method is specified by the method header,
-and the dword-address of each subsequent method is equal to the dword-address of
-the previous method plus one.  Or in other words, the byte-address of each
-subsequent method is equal to the byte-address of the previous method plus four.
-
-Example sequence of methods generated from an incrementing method header:
-
-     addr    data0
-     addr+1  data1
-     addr+2  data2
-     addr+3  data3
-     ...      ...
-
-     The NV_FIFO_DMA_INCR_COUNT field contains the number of methods in the
-generated sequence.  This is the same as the number of method data entries that
-follow the method header.  If the COUNT field is zero, the other fields are
-ignored, and the PB method effectively becomes a no-op with no method data
-entries following it.
-
-     The NV_FIFO_DMA_INCR_SUBCHANNEL field contains the subchannel to use for
-the methods generated from the method header.  See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
-     The NV_FIFO_DMA_INCR_ADDRESS field contains the method address for the
-first method in the generated sequence.  The dword-address of the method is
-incremented by one each time a method is generated.  A method address specifies
-an operation to be performed.  Note that because the ADDRESS is a dword-address
-and not a byte-address, the least two significant bits of the method's
-byte-address are not stored.
-
-     The NV_FIFO_DMA_INCR_DATA fields contain the method data for the methods in
-the generated sequence.  The number of method data entries is defined by the
-COUNT field.  A method data entry contains an operand for its respective method.
-
-     Bit 12 is reserved for the future expansion of either the subchannel or the
-address fields.
-
-
-#define NV_FIFO_DMA_INCR                                            /* ----G */
-#define NV_FIFO_DMA_INCR_OPCODE                 (0*32+31):(0*32+29) /* RWXUF */
-#define NV_FIFO_DMA_INCR_OPCODE_VALUE                    0x00000001 /* ----V */
-#define NV_FIFO_DMA_INCR_COUNT                  (0*32+28):(0*32+16) /* RWXUF */
-#define NV_FIFO_DMA_INCR_SUBCHANNEL             (0*32+15):(0*32+13) /* RWXUF */
-#define NV_FIFO_DMA_INCR_ADDRESS                 (0*32+11):(0*32+0) /* RWXUF */
-#define NV_FIFO_DMA_INCR_DATA                    (1*32+31):(1*32+0) /* RWXUF */
-
-
-Non-Incrementing PB Method Header Format
-
-     A non-incrementing PB method header specifies that Host generate a sequence
-of methods.  The length of the sequence is defined by the method header.  The
-method data for each method in this sequence is contained within the PB entries
-immediately following the method header.
-
-     Unlike with the incrementing PB method header, the sequence of methods
-generated all have the same method address.  The dword-address of every method
-in this sequence is specified by the method header.  Although the methods all
-have the same address, the method data entries may be different.
-
-Example sequence of methods generated from a non-incrementing method header:
-
-     addr    data0
-     addr    data1
-     addr    data2
-     addr    data3
-     ...      ...
-
-     The NV_FIFO_DMA_NONINCR_COUNT field contains the number of methods
-in the generated sequence.  This is the same as the number of method data
-entries that follow the method header.  If the COUNT field is zero, the other
-fields are ignored, and the PB method effectively becomes a no-op with no method
-data entries following it.
-
-     The NV_FIFO_DMA_NONINCR_SUBCHANNEL field contains the subchannel to use for
-the methods generated from the method header.  See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
-     The NV_FIFO_DMA_NONINCR_ADDRESS field contains the method address for every
-method in the generated sequence.  A method address specifies an operation to be
-performed.  Note that because the ADDRESS field is a dword-address and not a
-byte-address, the least two significant bits of the method's byte-address are
-not stored.
-
-     The NV_FIFO_DMA_NONINCR_DATA fields contain the method data for the methods
-in the generated sequence.  The number of method data entries is defined by the
-COUNT field.  A method data entry contains an operand for its respective method.
-
-     Bit 12 is reserved for the future expansion of either the subchannel or the
-address fields.
-
-
-#define NV_FIFO_DMA_NONINCR                                         /* ----G */
-#define NV_FIFO_DMA_NONINCR_OPCODE              (0*32+31):(0*32+29) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_OPCODE_VALUE                 0x00000003 /* ----V */
-#define NV_FIFO_DMA_NONINCR_COUNT               (0*32+28):(0*32+16) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_SUBCHANNEL          (0*32+15):(0*32+13) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_ADDRESS              (0*32+11):(0*32+0) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_DATA                 (1*32+31):(1*32+0) /* RWXUF */
-
-
-Increment-Once PB Method Header Format
-
-     An increment-once PB method header specifies that Host generate a sequence
-of methods.  The length of the sequence is defined by the method header.  The
-method data for each method in this sequence is found in a sequence of PB
-entries immediately following the method header.
-
-     The dword-address of the first method is specified by the method header.
-The address of the second and all following methods is equal to the
-dword-address of the first method plus one.  In other words, the byte-address of
-the second and all following methods is equal to the byte-address of the first
-method plus four.
-
-Example sequence of methods generated from an increment-once method header:
-
-     addr     data0
-     addr+1   data1
-     addr+1   data2
-     addr+1   data3
-     ...      ...
-
-     The NV_FIFO_DMA_ONEINCR_COUNT field contains the number of methods in the
-generated sequence.  This is the same as the number of method data entries that
-follow the method header.  If the COUNT field is zero, the other fields are
-ignored, and the PB method effectively becomes a no-op method with no method
-data entries following it.
-
-     The NV_FIFO_DMA_ONEINCR_SUBCHANNEL field contains the subchannel to use for
-the methods generated from the method header.  See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
-     The NV_FIFO_DMA_ONEINCR_ADDRESS field contains the method address for the
-first method in the generated sequence.  A method address specifies an operation
-to be performed.  Note that because the ADDRESS is a dword-address and not a
-byte-address, the least two significant bits of the method's byte-address are
-not stored.
-
-     The NV_FIFO_DMA_ONEINCR_DATA fields contain the method data for the methods
-in the generated sequence.  The number of method data entries is defined by the
-COUNT field.  A method data entry contains an operand for its respective method.
-
-     Bit 12 is reserved for the future expansion of either the subchannel or the
-address fields.
-
-
-#define NV_FIFO_DMA_ONEINCR                                         /* ----G */
-#define NV_FIFO_DMA_ONEINCR_OPCODE              (0*32+31):(0*32+29) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_OPCODE_VALUE                 0x00000005 /* ----V */
-#define NV_FIFO_DMA_ONEINCR_COUNT               (0*32+28):(0*32+16) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_SUBCHANNEL          (0*32+15):(0*32+13) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_ADDRESS              (0*32+11):(0*32+0) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_DATA                 (1*32+31):(1*32+0) /* RWXUF */
-
-
-No-Operation PB Instruction Formats
-
-     The method header for a no-op PB method may be specified in multiple ways,
-but the preferred way is to set the PB instruction to NV_FIFO_DMA_NOP.
-In any case NV_FIFO_DMA_NOP is a universal NOP entry that bypasses any method
-header format check, and is not considered a method header.
-
-
-#define NV_FIFO_DMA_NOP                                  0x00000000 /* ----C */
-
-
-Immediate-Data PB Method Header Format
-
-     If a method's operand fits within 13 bits, a PB method may be specified in
-a single PB entry, using the immediate-data PB method header format.  Exactly
-one method is generated from this method header.
-
-     The NV_FIFO_DMA_IMMD_SUBCHANNEL field contains the subchannel to use for
-the method generated from the method header.  See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
-     The NV_FIFO_DMA_IMMD_ADDRESS field contains the method address for the
-single generated method.  A method address specifies an operation to be
-performed.  Note that because the ADDRESS is a dword-address and not a
-byte-address, the least two significant bits of the method's byte-address are
-not stored.
-
-     The single NV_FIFO_DMA_IMMD_DATA field contains the method data for the
-generated method.  This method data contains an operand for the generated
-method.
-
-
-#define NV_FIFO_DMA_IMMD                                            /* ----G */
-#define NV_FIFO_DMA_IMMD_ADDRESS                               11:0 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_SUBCHANNEL                           15:13 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_DATA                                 28:16 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_OPCODE                               31:29 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_OPCODE_VALUE                    0x00000004 /* ----V */
-
-
-Set Sub-Device Mask PB Control Entry Format
-
-     The SET_SUBDEVICE_MASK (SSDM) PB control entry is used when multiple GPU
-contexts are using the same pushbuffer (for example, for SLI or for stereo
-rendering) and there is data in the push buffer that is for only a subset of the
-GPU contexts.  This instruction allows the pushbuffer to tell a specific GPU
-context to use or ignore methods following the SET_SUBDEVICE_MASK.  While the
-logical-AND of NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE and the GPU context's
-NV_PPBDMA_SUBDEVICE_ID value is zero, methods are ignored.  Pushbuffer control
-entries (like SET_SUBDEVICE_MASK) are not ignored.
-
-********************************************************************************
-Warning: When using subdevice masking, one must take care to synchronize
-properly with any later GP entries marked FETCH_CONDITIONAL.  If GP fetching
-gets too far ahead of PB processing, it is possible for a later conditional PB
-segment to be discarded prior to reaching an SSDM command that sets
-SUBDEVICE_STATUS to ACTIVE.  This would cause Host to execute garbage data.  One
-way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL
-segments following a subdevice reenable.
-********************************************************************************
-
-
-
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK                              /* ----G */
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE                   15:4 /* RWXUF */
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE                 31:16 /* RWXUF */
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE_VALUE      0x00000001 /* ----V */
-
-
-Store Sub-Device Mask PB Control Entry Format
-
-     The STORE_SUBDEVICE_MASK PB control entry is used to save a subdevice mask
-value to be used later by a USE_SUBDEVICE_MASK PB instruction.
-
-
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK                            /* ----G */
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_VALUE                 15:4 /* RWXUF */
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE               31:16 /* RWXUF */
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE_VALUE    0x00000002 /* ----V */
-
-
-Use Sub-Device Mask PB Control Entry Format
-
-     The USE_SUBDEVICE_MASK PB control entry is used to apply the subdevice mask
-value saved by a STORE_SUBDEVICE_MASK PB instruction.  The effect of the mask is
-the same as for a SET_SUBDEVICE_MASK PB instruction.
-
-
-#define NV_FIFO_DMA_USE_SUBDEVICE_MASK                              /* ----G */
-#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE                 31:16 /* RWXUF */
-#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE_VALUE      0x00000003 /* ----V */
-
-
-End-PB-Segment PB Control Entry Format
-
-     Engines may write PB segments themselves, but they cannot write GP entries.
-Because they cannot write GP entries, they cannot alter the size of a PB
-segment.  If an engine is writing a PB segment, and if it does not need to fill
-the entire PB segment it was allocated, instead of filling the remainder of the
-PB segment with no-op PB instructions, it may write a single End-PB-Segment
-control entry to indicate that the pushbuffer data contains no further valid
-data.  No further PB entries from that PB segment will be decoded or processed.
-Host may have already issued requests to fetch the remainder of the PB segment
-before an End-PB-Segment PB instruction is processed.  Host may or may not fetch
-the remainder of the PB segment.  Also note that doing a PB CRC check on this
-segment via NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC will be indeterminate.
-
-
-#define NV_FIFO_DMA_ENDSEG_OPCODE                             31:29 /* RWXUF */
-#define NV_FIFO_DMA_ENDSEG_OPCODE_VALUE                  0x00000007 /* ----V */
-
-
author	John Hubbard <jhubbard@nvidia.com>	2019-06-12 14:41:51 -0700
committer	John Hubbard <jhubbard@nvidia.com>	2019-06-13 19:23:50 -0700
commit	f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f (patch)
tree	1f9488efca18d52ccfc016c7531df4ceac94989c /Host-Fifo/volta/gv100/dev_ram.ref.txt
parent	187a308aea3f133dfb27ebf6bafe75ffa15fc353 (diff)
download	open-gpu-doc-f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f.tar.xz