summaryrefslogtreecommitdiff
path: root/manuals/volta/gv100/dev_ram.ref.txt
diff options
context:
space:
mode:
authorJohn Hubbard <jhubbard@nvidia.com>2019-06-12 14:41:51 -0700
committerJohn Hubbard <jhubbard@nvidia.com>2019-06-13 19:23:50 -0700
commitf9e4e0e07fd5a6a7757db977f69c8e91a0ae283f (patch)
tree1f9488efca18d52ccfc016c7531df4ceac94989c /manuals/volta/gv100/dev_ram.ref.txt
parent187a308aea3f133dfb27ebf6bafe75ffa15fc353 (diff)
downloadopen-gpu-doc-f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f.tar.xz
New ref manuals directory, delete old locations
As decided in a recent OpenSource-Approval meeting, we want the directory structure for reference manuals here to be fairly close to the way they are organized internal to NVIDIA. This CL therefore does the following: Rename from: Host-Fifo/volta/gv100/* Display-Ref-Manuals/gv100/* to: manuals/volta/gv100/* Regenerate index.html files to match (important for the "github pages" site, at https://nvidia.github.io/open-gpu-doc/ . Reviewed by: Maneet Singh
Diffstat (limited to 'manuals/volta/gv100/dev_ram.ref.txt')
-rw-r--r--manuals/volta/gv100/dev_ram.ref.txt1269
1 files changed, 1269 insertions, 0 deletions
diff --git a/manuals/volta/gv100/dev_ram.ref.txt b/manuals/volta/gv100/dev_ram.ref.txt
new file mode 100644
index 0000000..e80d9c0
--- /dev/null
+++ b/manuals/volta/gv100/dev_ram.ref.txt
@@ -0,0 +1,1269 @@
+Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the "Software"),
+to deal in the Software without restriction, including without limitation
+the rights to use, copy, modify, merge, publish, distribute, sublicense,
+and/or sell copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
+--------------------------------------------------------------------------------
+
+2 - GPU INSTANCE RAM (RAMIN)
+==============================
+
+ A GPU contains a block called "XVE" that manages the interface with PCI, a
+block called "Host" that fetches graphics instructions, blocks called "engines"
+that execute graphics instructions, and blocks that manage the interface with
+memory.
+
+ .-----. .------.
+ | |<------------------>| |
+ | | | |
+ | | .---------. | |
+ | |<--->| Engine1 |<---| |
+ | | `---------' | |
+.---------. | | | |
+| GPU | | | .---------. | Host |
+| Local |<-->| FB |<--->| Engine2 |<---| |
+| Memory | | MMU | `---------' | |
+`---------' | Hub | ... | | .--------.
+ | | .---------. | | | System |
+ | |<--->| EngineN |<---| | | Memory |
+ | | `---------' `------' `--------'
+ | | ^ ^
+ | | | |
+.---------. | | .--V--. PCI .--V--. .-----.
+| Display |<-->| |<------------------>| XVE |<--->| NB |<--->| CPU |
+`---------' `-----' `-----' `-----' `-----'
+
+ A GPU context is a virtualization of the GPU for a particular software
+application. A GPU instance block is a block of memory that contains the state
+for a GPU context. A GPU context's instance block consists of Host state,
+pointers to each engine's state, and memory management state. A GPU instance
+block also contains a pointer to a block of memory that contains that part of a
+GPU context's state that a user-level driver may access. A GPU instance block
+fits within a single 4K-byte page of memory.
+
+ Run List Channel-Map RAM
+ .----------. Ch Id .----------------.
+ | RL Entry0 |----. |Ch0 Inst Blk Ptr|
+ | RL Entry1 | | |Ch1 Inst Blk Ptr|
+ | RL Entry2 | | | ... |
+ | ... | `--->|ChI Inst Blk Ptr|----.
+ | RL EntryN | | ... | |
+ `-----------' |ChN Inst Blk Ptr| |
+ `----------------' |
+ |
+ .-----------------------------------------------'
+ |
+ | GPU Instance Block GPFIFO
+ `-->.-----------------. GP_GET .--------. PB Seg
+ | |------------------------------>|GP Entry| .--------.
+ | Host State | |GP Entry|--->|PB Entry|
+ | (RAMFC) | User-Driver State | | |PB Entry|
+ | | .-------. |GP Entry| | ... |
+ | |------------->|(USERD)| GP_PUT |GP Entry| |PB Entry|
+ | | | |------->`--------' `--------'
+ | | | |
+ +-----------------+ | |
+ | Memory | `-------'
+ | Management |----------. Page Directory Page Table
+ | State | | .-------. .-------.
+ +-----------------+ `-->| PDE | | PTE |
+ | Pointer to | | PDE |------->| PTE |
+ | Engine0 |--------. | ... | | ... |
+ | State | | | PDE | | PTE |
+ +-----------------+ | `-------' `-------'
+ | Pointer to | |
+ | Engine1 |-----. | Engine0 State
+ | State | | | .-------.
+ +-----------------+ | `---->| |
+ ... | `-------'
+ +-----------------+ |
+ | Pointer to | | Engine1 State
+ | EngineN |--. | .-------.
+ | State | | `------->| |
+ `-----------------' | `-------'
+ | ...
+ |
+ | EngineN State
+ | .-------.
+ `---------->| |
+ `-------'
+
+ The GPU context's Host state occupies the first 128 double words of an
+instance block. A GPU context's Host state is called "RAMFC". Please see
+the NV_RAMFC section below for a description of Host state.
+
+ The GPU context's memory-management state defines the virtual address space
+that the GPU context uses. Memory management state consists of page and
+directory tables (that specify the mapping between virtual addresses and
+physical addresses, and the attributes of memory pages), and the limit of the
+virtual address space. The NV_RAMIN_PAGE_DIR_BASE entry contains the address of
+base of the GPU context's page directory table (PDB). NV_RAMIN_PAGE_DIR_BASE is
+4K-byte aligned.
+
+ The NV_RAMIN_ENG*_WFI_PTR entry contains the address of a block of memory
+for storing an engine's context state. Blocks of memory that contain engine state
+are 4K-byte aligned. Only one engine context is supported per instance block.
+
+ The NV_RAMIN_ENG*_CS field is deprecated, it was used to indicate whether
+GPU state should be restored from the FGCS pointer or from the WFI CS pointer.
+Engines only need/support one CTXSW pointer and all state is stored there
+whether a WFI CS or other form of preemption was performed. This field must
+always be set to WFI for legacy reasons, and will eventually be deleted.
+
+
+#define NV_RAMIN /* ----G */
+
+// The instance block must be 4k-aligned.
+#define NV_RAMIN_BASE_SHIFT 12 /* */
+
+// The instance block size fits within a single 4k block.
+#define NV_RAMIN_ALLOC_SIZE 4096 /* */
+
+// Host State
+#define NV_RAMIN_RAMFC (127*32+31):(0*32+0) /* RWXUF */
+
+// Memory-Management State
+
+ The following fields are used for non-VEID engines. The NV_RAMIN_SC_* described later
+ are used for VEID engines.
+
+ NV_RAMIN_PAGE_DIR_BASE_TARGET determines if the top level of the page tables
+ is in video memory or system memory (peer is not allowed), and the CPU cache
+ coherency for system memory.
+ Using INVALID, unbinds the selected engine.
+
+#define NV_RAMIN_PAGE_DIR_BASE_TARGET (128*32+1):(128*32+0) /* RWXUF */
+#define NV_RAMIN_PAGE_DIR_BASE_TARGET_VID_MEM 0x00000000 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_TARGET_INVALID 0x00000001 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
+
+ NV_RAMIN_PAGE_DIR_BASE_VOL identifies the volatile behavior
+ of top level of the page table (whether local L2 can cache it or not).
+
+#define NV_RAMIN_PAGE_DIR_BASE_VOL (128*32+2):(128*32+2) /* RWXUF */
+#define NV_RAMIN_PAGE_DIR_BASE_VOL_TRUE 0x00000001 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_VOL_FALSE 0x00000000 /* RW--V */
+
+
+ These bits specify whether the MMU will treats faults as replayable or not.
+ The engine will send these bits to the MMU as part of the instance bind.
+
+#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX (128*32+4):(128*32+4) /* RWXUF */
+#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED 0x00000000 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED 0x00000001 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC (128*32+5):(128*32+5) /* RWXUF */
+#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED 0x00000000 /* RW--V */
+#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED 0x00000001 /* RW--V */
+
+ NV_RAMIN_USE_NEW_PT_FORMAT determines which page table format to use.
+ When NV_RAMIN_USE_NEW_PT_FORMAT is false, the page table uses the old format.
+ When NV_RAMIN_USE_NEW_PT_FORMAT is true, the page table uses the new format.
+
+ Volta only supports the new format. Selecting the old format results in an UNBOUND_INSTANCE fault.
+
+
+#define NV_RAMIN_USE_VER2_PT_FORMAT (128*32+10):(128*32+10) /* */
+#define NV_RAMIN_USE_VER2_PT_FORMAT_FALSE 0x00000000 /* */
+#define NV_RAMIN_USE_VER2_PT_FORMAT_TRUE 0x00000001 /* */
+
+ When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit TRUE, the bit selects the big page size.
+ When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit FALSE, NV_PFB_PRI_MMU_CTRL_VM_PG_SIZE selects the big page size.
+
+ Volta only supports 64KB for big pages. Selecting 128KB for big pages results in an UNBOUND_INSTANCE fault.
+
+#define NV_RAMIN_BIG_PAGE_SIZE (128*32+11):(128*32+11) /* RWXUF */
+#define NV_RAMIN_BIG_PAGE_SIZE_128KB 0x00000000 /* RW--V */
+#define NV_RAMIN_BIG_PAGE_SIZE_64KB 0x00000001 /* RW--V */
+
+ NV_RAMIN_PAGE_DIR_BASE_LO and NV_RAMIN_PAGE_DIR_BASE_HI
+ identify the page directory base (start of the page table)
+ location for this context.
+
+#define NV_RAMIN_PAGE_DIR_BASE_LO (128*32+31):(128*32+12) /* RWXUF */
+#define NV_RAMIN_PAGE_DIR_BASE_HI (129*32+31):(129*32+0) /* RWXUF */
+
+// Single engine pointer channels cannot support multiple
+// engines with CTXSW pointers
+#define NV_RAMIN_ENGINE_CS (132*32+3):(132*32+3) /* */
+#define NV_RAMIN_ENGINE_CS_WFI 0x00000000 /* */
+#define NV_RAMIN_ENGINE_CS_FG 0x00000001 /* */
+#define NV_RAMIN_ENGINE_WFI_TARGET (132*32+1):(132*32+0) /* */
+#define NV_RAMIN_ENGINE_WFI_TARGET_LOCAL_MEM 0x00000000 /* */
+#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_COHERENT 0x00000002 /* */
+#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* */
+#define NV_RAMIN_ENGINE_WFI_MODE (132*32+2):(132*32+2) /* */
+#define NV_RAMIN_ENGINE_WFI_MODE_PHYSICAL 0x00000000 /* */
+#define NV_RAMIN_ENGINE_WFI_MODE_VIRTUAL 0x00000001 /* */
+#define NV_RAMIN_ENGINE_WFI_PTR_LO (132*32+31):(132*32+12) /* */
+#define NV_RAMIN_ENGINE_WFI_PTR_HI (133*32+7):(133*32+0) /* */
+
+#define NV_RAMIN_ENGINE_WFI_VEID (134*32+(6-1)):(134*32+0) /* */
+#define NV_RAMIN_ENABLE_ATS (135*32+31):(135*32+31) /* RWXUF */
+#define NV_RAMIN_ENABLE_ATS_TRUE 0x00000001 /* RW--V */
+#define NV_RAMIN_ENABLE_ATS_FALSE 0x00000000 /* RW--V */
+#define NV_RAMIN_PASID (135*32+(20-1)):(135*32+0) /* RWXUF */
+
+
+ Pointer to a method buffer in BAR2 memory where a faulted engine can save
+out methods. BAR2 accesses are assumed to be virtual, so the address saved here
+is a virtual address.
+
+#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_LO (136*32+31):(136*32+0) /* RWXUF */
+#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_HI (137*32+(((49-1)-32))):(137*32+0) /* RWXUF */
+
+
+
+ These entries are used to inform FECS which of the below array of PDBs are
+ valid/filled in and need to subsequently be bound.
+
+ This needs to reserve at least NV_LITTER_NUM_SUBCTX entries. Currently
+ there is enough space reserved for 64 subcontexts.
+#define NV_RAMIN_SC_PDB_VALID(i) (166*32+i):(166*32+i) /* RWXUF */
+#define NV_RAMIN_SC_PDB_VALID__SIZE_1 64 /* */
+#define NV_RAMIN_SC_PDB_VALID_FALSE 0x00000000 /* RW--V */
+#define NV_RAMIN_SC_PDB_VALID_TRUE 0x00000001 /* RW--V */
+
+// Memory-Management VEID array
+
+ The NV_RAMIN_SC_PAGE_DIR_BASE_* entries are an array of page table settings
+ for each subcontext. When a context supports subcontexts, the page table
+ information for a given VEID/Subcontext needs to be filled in or else page
+ faults will result on access.
+
+ These properties for the page table must be filled in for all channels
+ sharing the same context as any channel's NV_RAMIN may be used to load the
+ context.
+
+ The non-subcontext page table information such as NV_RAMIN_PAGE_DIR_BASE*
+ are used by non-subcontext engines and clients such as Host, CE, or the
+ video engines.
+
+ NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) determines if the top level of the page tables
+ is in video memory or system memory (peer is not allowed), and the CPU cache
+ coherency for system memory.
+ Using INVALID, unbinds the selected subcontext.
+
+#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) ((168+(i)*4)*32+1):((168+(i)*4)*32+0) /* RWXUF */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET__SIZE_1 64 /* */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_VID_MEM 0x00000000 /* RW--V */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_INVALID 0x00000001 /* RW--V */ // Note: INVALID should match PEER
+#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
+
+ NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) identifies the volatile behavior
+ of the top level of the page table (whether local L2 can cache it or not).
+
+#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) ((168+(i)*4)*32+2):((168+(i)*4)*32+2) /* RWXUF */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL__SIZE_1 64 /* */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_TRUE 0x00000001 /* RW--V */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_FALSE 0x00000000 /* RW--V */
+
+ NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) and
+ NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) bits specify whether
+ the MMU will treats faults from TEX and GCC as replayable or
+ not. Based on that fault packets are written into replayable fault
+ buffer (or not) and faulting requests are put into replay request
+ buffer (or not).
+ The last bind that does not unbind a sub-context determines the REPLAY_TEX and REPLAY_GCC for all sub-contexts.
+
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) ((168+(i)*4)*32+4):((168+(i)*4)*32+4) /* RWXUF */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX__SIZE_1 64 /* */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED 0x00000000 /* RW--V */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED 0x00000001 /* RW--V */
+
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) ((168+(i)*4)*32+5):((168+(i)*4)*32+5) /* RWXUF */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC__SIZE_1 64 /* */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED 0x00000000 /* RW--V */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED 0x00000001 /* RW--V */
+
+ NV_RAMIN_SC_USE_VER2_PT_FORMAT determines which page table format to use.
+ When NV_RAMIN_SC_USE_VER2_PT_FORMAT is false, the page table uses
+ the old format(2-level page table). When
+ NV_RAMIN_SC_USE_VER2_PT_FORMAT is true, the page table uses the
+ new format (5-level 49-bit VA format).
+ The last bind that does not unbind a sub-context determines the page table format for all sub-contexts.
+ Volta only supports the new format. Selecting the old format results in an UNBOUND_INSTANCE fault.
+
+#define NV_RAMIN_SC_USE_VER2_PT_FORMAT(i) ((168+(i)*4)*32+10):((168+(i)*4)*32+10) /* RWXUF */
+#define NV_RAMIN_SC_USE_VER2_PT_FORMAT__SIZE_1 64 /* */
+#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_FALSE 0x00000000 /* RW--V */
+#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_TRUE 0x00000001 /* RW--V */
+
+ The last bind that does not unbind a sub-context determines the big page size for all sub-contexts.
+ Volta only supports 64KB for big pages.
+
+#define NV_RAMIN_SC_BIG_PAGE_SIZE(i) ((168+(i)*4)*32+11):((168+(i)*4)*32+11) /* RWXUF */
+#define NV_RAMIN_SC_BIG_PAGE_SIZE__SIZE_1 64 /* */
+#define NV_RAMIN_SC_BIG_PAGE_SIZE_64KB 0x00000001 /* RW--V */
+
+ NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) and NV_RAMIN_SC_PAGE_DIR_BASE_HI(i)
+ identify the page directory base (start of the page table)
+ location for subcontext i.
+
+#define NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) ((168+(i)*4)*32+31):((168+(i)*4)*32+12) /* RWXUF */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_LO__SIZE_1 64 /* */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_HI(i) ((169+(i)*4)*32+31):((169+(i)*4)*32+0) /* RWXUF */
+#define NV_RAMIN_SC_PAGE_DIR_BASE_HI__SIZE_1 64 /* */
+
+
+
+
+
+ NV_RAMIN_SC_ENABLE_ATS(i) tells whether subcontext i is ATS
+ enabled or not. In case, set to TRUE, GMMU will look for VA->PA
+ translations into both GMMU and ATS page tables.
+ ATS can be enabled or disabled per subcontext.
+
+#define NV_RAMIN_SC_ENABLE_ATS(i) ((170+(i)*4)*32+31):((170+(i)*4)*32+31) /* RWXUF */
+
+ NV_RAMIN_SC_PASID(i) identifies the PASID (process address space
+ ID) in CPU for subcontext i. PASID is used to get ATS
+ translation when ATS page table lookup is needed. During ATS TLB
+ shootdown, PASID is also used to match against the one coming with
+ shootdown request.
+
+#define NV_RAMIN_SC_PASID(i) ((170+(i)*4)*32+(20-1)):((170+(i)*4)*32+0) /* RWXUF */
+
+
+
+
+3 - FIFO CONTEXT RAM (RAMFC)
+==============================
+
+
+ The NV_RAMFC part of a GPU-instance block contains Host's part of a virtual
+GPU's state. Host is referred to as "FIFO". "FC" stands for FIFO Context.
+When Host switches from serving one GPU context to serving a second, Host saves
+state for the first GPU context to the first GPU context's RAMFC area, and loads
+state for the second GPU context from the second GPU context's RAMFC area.
+
+ RAMFC is located at NV_RAMIN_RAMFC within the GPU instance block. In
+Kepler, this is at the start of the block. RAMFC is 4KB aligned.
+
+ Every Host word entry in RAMFC directly corresponds to a PRI-accessible
+register. For a description of the contents of a RAMFC entry, please see the
+description of the corresponding register in "manuals/dev_pbdma.ref". The
+offsets of the fields within each entry in RAMFC match those of the
+corresponding register in the associated PBDMA unit's PRI space.
+
+
+ RAMFC Entry PBDMA Register
+ ------------------------------- ----------------------------------
+ NV_RAMFC_SIGNATURE NV_PPBDMA_SIGNATURE(i)
+ NV_RAMFC_GP_BASE NV_PPBDMA_GP_BASE(i)
+ NV_RAMFC_GP_BASE_HI NV_PPBDMA_GP_BASE_HI(i)
+ NV_RAMFC_GP_FETCH NV_PPBDMA_GP_FETCH(i)
+ NV_RAMFC_GP_GET NV_PPBDMA_GP_GET(i)
+ NV_RAMFC_GP_PUT NV_PPBDMA_GP_PUT(i)
+ NV_RAMFC_PB_FETCH NV_PPBDMA_PB_FETCH(i)
+ NV_RAMFC_PB_FETCH_HI NV_PPBDMA_PB_FETCH_HI(i)
+ NV_RAMFC_PB_GET NV_PPBDMA_GET(i)
+ NV_RAMFC_PB_GET_HI NV_PPBDMA_GET_HI(i)
+ NV_RAMFC_PB_PUT NV_PPBDMA_PUT(i)
+ NV_RAMFC_PB_PUT_HI NV_PPBDMA_PUT_HI(i)
+ NV_RAMFC_PB_TOP_LEVEL_GET NV_PPBDMA_TOP_LEVEL_GET(i)
+ NV_RAMFC_PB_TOP_LEVEL_GET_HI NV_PPBDMA_TOP_LEVEL_GET_HI(i)
+ NV_RAMFC_GP_CRC NV_PPBDMA_GP_CRC(i)
+ NV_RAMFC_PB_HEADER NV_PPBDMA_PB_HEADER(i)
+ NV_RAMFC_PB_COUNT NV_PPBDMA_PB_COUNT(i)
+ NV_RAMFC_PB_CRC NV_PPBDMA_PB_CRC(i)
+ NV_RAMFC_SUBDEVICE NV_PPBDMA_SUBDEVICE(i)
+ NV_RAMFC_METHOD0 NV_PPBDMA_METHOD0(i)
+ NV_RAMFC_METHOD1 NV_PPBDMA_METHOD1(i)
+ NV_RAMFC_METHOD2 NV_PPBDMA_METHOD2(i)
+ NV_RAMFC_METHOD3 NV_PPBDMA_METHOD3(i)
+ NV_RAMFC_DATA0 NV_PPBDMA_DATA0(i)
+ NV_RAMFC_DATA1 NV_PPBDMA_DATA1(i)
+ NV_RAMFC_DATA2 NV_PPBDMA_DATA2(i)
+ NV_RAMFC_DATA3 NV_PPBDMA_DATA3(i)
+ NV_RAMFC_TARGET NV_PPBDMA_TARGET(i)
+ NV_RAMFC_METHOD_CRC NV_PPBDMA_METHOD_CRC(i)
+ NV_RAMFC_REF NV_PPBDMA_REF(i)
+ NV_RAMFC_RUNTIME NV_PPBDMA_RUNTIME(i)
+ NV_RAMFC_SEM_ADDR_LO NV_PPBDMA_SEM_ADDR_LO(i)
+ NV_RAMFC_SEM_ADDR_HI NV_PPBDMA_SEM_ADDR_HI(i)
+ NV_RAMFC_SEM_PAYLOAD_LO NV_PPBDMA_SEM_PAYLOAD_LO(i)
+ NV_RAMFC_SEM_PAYLOAD_HI NV_PPBDMA_SEM_PAYLOAD_HI(i)
+ NV_RAMFC_SEM_EXECUTE NV_PPBDMA_SEM_EXECUTE(i)
+ NV_RAMFC_ACQUIRE_DEADLINE NV_PPBDMA_ACQUIRE_DEADLINE(i)
+ NV_RAMFC_ACQUIRE NV_PPBDMA_ACQUIRE(i)
+ NV_RAMFC_MEM_OP_A NV_PPBDMA_MEM_OP_A(i)
+ NV_RAMFC_MEM_OP_B NV_PPBDMA_MEM_OP_B(i)
+ NV_RAMFC_MEM_OP_C NV_PPBDMA_MEM_OP_C(i)
+ NV_RAMFC_USERD NV_PPBDMA_USERD(i)
+ NV_RAMFC_USERD_HI NV_PPBDMA_USERD_HI(i)
+ NV_RAMFC_HCE_CTRL NV_PPBDMA_HCE_CTRL(i)
+ NV_RAMFC_CONFIG NV_PPBDMA_CONFIG(i)
+ NV_RAMFC_SET_CHANNEL_INFO NV_PPBDMA_SET_CHANNEL_INFO(i)
+ ------------------------------- ----------------------------------
+
+#define NV_RAMFC /* ----G */
+#define NV_RAMFC_GP_PUT (0*32+31):(0*32+0) /* RWXUF */
+#define NV_RAMFC_MEM_OP_A (1*32+31):(1*32+0) /* RWXUF */
+#define NV_RAMFC_USERD (2*32+31):(2*32+0) /* RWXUF */
+#define NV_RAMFC_USERD_HI (3*32+31):(3*32+0) /* RWXUF */
+#define NV_RAMFC_SIGNATURE (4*32+31):(4*32+0) /* RWXUF */
+#define NV_RAMFC_GP_GET (5*32+31):(5*32+0) /* RWXUF */
+#define NV_RAMFC_PB_GET (6*32+31):(6*32+0) /* RWXUF */
+#define NV_RAMFC_PB_GET_HI (7*32+31):(7*32+0) /* RWXUF */
+#define NV_RAMFC_PB_TOP_LEVEL_GET (8*32+31):(8*32+0) /* RWXUF */
+#define NV_RAMFC_PB_TOP_LEVEL_GET_HI (9*32+31):(9*32+0) /* RWXUF */
+#define NV_RAMFC_REF (10*32+31):(10*32+0) /* RWXUF */
+#define NV_RAMFC_RUNTIME (11*32+31):(11*32+0) /* RWXUF */
+#define NV_RAMFC_ACQUIRE (12*32+31):(12*32+0) /* RWXUF */
+#define NV_RAMFC_ACQUIRE_DEADLINE (13*32+31):(13*32+0) /* RWXUF */
+#define NV_RAMFC_SEM_ADDR_HI (14*32+31):(14*32+0) /* RWXUF */
+#define NV_RAMFC_SEM_ADDR_LO (15*32+31):(15*32+0) /* RWXUF */
+#define NV_RAMFC_SEM_PAYLOAD_LO (16*32+31):(16*32+0) /* RWXUF */
+#define NV_RAMFC_SEM_EXECUTE (17*32+31):(17*32+0) /* RWXUF */
+#define NV_RAMFC_GP_BASE (18*32+31):(18*32+0) /* RWXUF */
+#define NV_RAMFC_GP_BASE_HI (19*32+31):(19*32+0) /* RWXUF */
+#define NV_RAMFC_GP_FETCH (20*32+31):(20*32+0) /* RWXUF */
+#define NV_RAMFC_PB_FETCH (21*32+31):(21*32+0) /* RWXUF */
+#define NV_RAMFC_PB_FETCH_HI (22*32+31):(22*32+0) /* RWXUF */
+#define NV_RAMFC_PB_PUT (23*32+31):(23*32+0) /* RWXUF */
+#define NV_RAMFC_PB_PUT_HI (24*32+31):(24*32+0) /* RWXUF */
+#define NV_RAMFC_MEM_OP_B (25*32+31):(25*32+0) /* RWXUF */
+#define NV_RAMFC_RESERVED26 (26*32+31):(26*32+0) /* RWXUF */
+#define NV_RAMFC_RESERVED27 (27*32+31):(27*32+0) /* RWXUF */
+#define NV_RAMFC_RESERVED28 (28*32+31):(28*32+0) /* RWXUF */
+#define NV_RAMFC_GP_CRC (29*32+31):(29*32+0) /* RWXUF */
+#define NV_RAMFC_PB_HEADER (33*32+31):(33*32+0) /* RWXUF */
+#define NV_RAMFC_PB_COUNT (34*32+31):(34*32+0) /* RWXUF */
+#define NV_RAMFC_SUBDEVICE (37*32+31):(37*32+0) /* RWXUF */
+#define NV_RAMFC_PB_CRC (38*32+31):(38*32+0) /* RWXUF */
+#define NV_RAMFC_SEM_PAYLOAD_HI (39*32+31):(39*32+0) /* RWXUF */
+#define NV_RAMFC_MEM_OP_C (40*32+31):(40*32+0) /* RWXUF */
+#define NV_RAMFC_RESERVED20 (41*32+31):(41*32+0) /* RWXUF */
+#define NV_RAMFC_RESERVED21 (42*32+31):(42*32+0) /* RWXUF */
+#define NV_RAMFC_TARGET (43*32+31):(43*32+0) /* RWXUF */
+#define NV_RAMFC_METHOD_CRC (44*32+31):(44*32+0) /* RWXUF */
+#define NV_RAMFC_METHOD0 (48*32+31):(48*32+0) /* RWXUF */
+#define NV_RAMFC_DATA0 (49*32+31):(49*32+0) /* RWXUF */
+#define NV_RAMFC_METHOD1 (50*32+31):(50*32+0) /* RWXUF */
+#define NV_RAMFC_DATA1 (51*32+31):(51*32+0) /* RWXUF */
+#define NV_RAMFC_METHOD2 (52*32+31):(52*32+0) /* RWXUF */
+#define NV_RAMFC_DATA2 (53*32+31):(53*32+0) /* RWXUF */
+#define NV_RAMFC_METHOD3 (54*32+31):(54*32+0) /* RWXUF */
+#define NV_RAMFC_DATA3 (55*32+31):(55*32+0) /* RWXUF */
+#define NV_RAMFC_HCE_CTRL (57*32+31):(57*32+0) /* RWXUF */
+#define NV_RAMFC_CONFIG (61*32+31):(61*32+0) /* RWXUF */
+#define NV_RAMFC_SET_CHANNEL_INFO (63*32+31):(63*32+0) /* RWXUF */
+
+#define NV_RAMFC_BASE_SHIFT 12 /* */
+
+ Size of the full range of RAMFC in bytes.
+#define NV_RAMFC_SIZE_VAL 0x00000200 /* ----C */
+
+4 - USER-DRIVER ACCESSIBLE RAM (RAMUSERD)
+=========================================
+
+ A user-level driver is allowed to access only a small portion of a GPU
+context's state. The portion of a GPU context's state that a user-level driver
+can access is stored in a block of memory called NV_RAMUSERD. NV_RAMUSERD is a
+user-level driver's window into NV_RAMFC. The NV_RAMUSERD state for each GPU
+context is stored in an aligned NV_RAMUSERD_CHAN_SIZE-byte block of memory.
+
+ To submit more methods, a user driver writes a PB segment to
+memory, writes a GP entry that points to the PB segment, updates GP_PUT in
+RAMUSERD, and writes the channel's handle to the
+NV_USERMODE_NOTIFY_CHANNEL_PENDING register (see dev_usermode.ref).
+
+ The RAMUSERD data structure is updated at regular intervals as controlled
+by the NV_PFIFO_USERD_WRITEBACK setting (see dev_fifo.ref). For a particular
+channel, RAMUSERD writeback can be disabled and it is reccomended that SW track
+pushbuffer and channel progress via Host WFI_DIS semaphores rather than reading
+the RAMUSERD data structure.
+
+ When write-back is enabled a user driver can check the GPU progress in
+executing a channel's PB segments. The driver can use:
+ * GP_GET to monitor the index of the next GP entry the GPU will process
+ * PB_GET to monitor the address of the next PB entry the GPU will process
+ * TOP_LEVEL_GET (see NV_PPBDMA_TOP_LEVEL_GET) to monitor the address of the
+ next "top-level" (non-SUBROUTINE) PB entry the GPU will process
+ * REF to monitor the current "reference count" value see NV_PPBDMA_REF.
+
+ Each entry in RAMUSERD corresponds to a PRI-accessible PBDMA register in Host.
+For a description of the behavior and contents of a RAMUSERD entry, please see
+the description of the corresponding register in "manuals/dev_pbdma.ref".
+
+ RAMUSERD Entry PBDMA Register Access
+ ------------------------------- ----------------------------- ----------
+ NV_RAMUSERD_GP_PUT NV_PPBDMA_GP_PUT(i) Read/Write
+ NV_RAMUSERD_GP_GET NV_PPBDMA_GP_GET(i) Read-only
+ NV_RAMUSERD_GET NV_PPBDMA_GET(i) Read-only
+ NV_RAMUSERD_GET_HI NV_PPBDMA_GET_HI(i) Read-only
+ NV_RAMUSERD_PUT NV_PPBDMA_PUT(i) Read-only
+ NV_RAMUSERD_PUT_HI NV_PPBDMA_PUT_HI(i) Read-only
+ NV_RAMUSERD_TOP_LEVEL_GET NV_PPBDMA_TOP_LEVEL_GET(i) Read-only
+ NV_RAMUSERD_TOP_LEVEL_GET_HI NV_PPBDMA_TOP_LEVEL_GET_HI(i) Read-only
+ NV_RAMUSERD_REF NV_PPBDMA_REF(i) Read-only
+ ------------------------------- ----------------------------- ----------
+
+ A user driver may write to NV_RAMUSERD_GP_PUT to kick off more work in a
+channel. Although writes to the other, read-only, entries can alter memory,
+writes to those entries will not affect the operation of the GPU, and can be
+overwritten by the GPU.
+
+ When Host loads its part of a GPU context's state from RAMFC memory, it
+may not immediately read RAMUSERD_GP_PUT. Host can use the GP_PUT values from
+RAMFC directly from RAMFC while waiting for the RAMUSERD_GP_PUT to synchronize.
+Because reads of RAMUSERD_GP_PUT can be delayed, the value in NV_PPBDMA_GP_PUT
+can be older than the value in NV_RAMUSERD_GP_PUT.
+
+ When Host saves a GPU context's state to NV_RAMFC, it also writes to
+NV_RAMUSERD the values of the entries other than GP_PUT.
+Because Host does not continuously write the read-only RAMFC entries, the
+read-only values in USERD memory can be older than the values in the Host PBDMA
+unit.
+
+#define NV_RAMUSERD /* ----G */
+#define NV_RAMUSERD_PUT (16*32+31):(16*32+0) /* RWXUF */
+#define NV_RAMUSERD_GET (17*32+31):(17*32+0) /* RWXUF */
+#define NV_RAMUSERD_REF (18*32+31):(18*32+0) /* RWXUF */
+#define NV_RAMUSERD_PUT_HI (19*32+31):(19*32+0) /* RWXUF */
+#define NV_RAMUSERD_TOP_LEVEL_GET (22*32+31):(22*32+0) /* RWXUF */
+#define NV_RAMUSERD_TOP_LEVEL_GET_HI (23*32+31):(23*32+0) /* RWXUF */
+#define NV_RAMUSERD_GET_HI (24*32+31):(24*32+0) /* RWXUF */
+#define NV_RAMUSERD_GP_GET (34*32+31):(34*32+0) /* RWXUF */
+#define NV_RAMUSERD_GP_PUT (35*32+31):(35*32+0) /* RWXUF */
+#define NV_RAMUSERD_BASE_SHIFT 9 /* */
+#define NV_RAMUSERD_CHAN_SIZE 512 /* */
+
+
+
+
+5 - RUN-LIST RAM (RAMRL)
+========================
+
+ Software specifies the GPU contexts that hardware should "run" by writing a
+list of entries (known as a "runlist") to a 4k-aligned area of memory (beginning
+at NV_PFIFO_RUNLIST_BASE), and by notifying Host that a new list is available
+(by writing to NV_PFIFO_RUNLIST).
+ Submission of a new runlist causes Host to expire the timeslice of all work
+scheduled by the previous runlist, allowing it to schedule the channels present
+in the new runlist once they are fetched. SW can check the status of the runlist
+by polling NV_PFIFO_ENG_RUNLIST_PENDING. (see dev_fifo.ref NV_PFIFO_RUNLIST for
+a full description of the runlist submit mechanism).
+ Runlists can be stored in system memory or video memory (as specified by
+NV_PFIFO_RUNLIST_BASE_TARGET). If a runlist is stored in video memory, software
+will have to execute flush or read the last entry written before submitting the
+runlist to Host to guarantee coherency .
+ The size of a runlist entry data structure is 16 bytes. Each entry
+specifies either a channel entry or a TSG header; the type is determined by the
+NV_RAMRL_ENTRY_TYPE.
+
+
+Runlist Channel Entry Type:
+
+ A runlist entry of type NV_RAMRL_ENTRY_TYPE_CHAN specifies a channel to
+run. All such entries must occur within the span of some TSG as specified by
+the NV_RAMRL_ENTRY_TYPE_TSG described below. If a channel entry is encountered
+outside a TSG, Host will raise the NV_PFIFO_INTR_SCHED_ERROR_CODE_BAD_TSG
+interrupt.
+
+ The fields available in a channel runlist entry are as follows (Fig 5.1):
+
+ ENTRY_TYPE (T) : type of this entry: ENTRY_TYPE_CHAN
+ CHID (ID) : identifier of the channel to run (overlays ENTRY_ID)
+ RUNQUEUE_SELECTOR (Q) : selects which PBDMA should run this channel if
+ more than one PBDMA is supported by the runlist
+
+ INST_PTR_LO : lower 20 bits of the 4k-aligned instance block pointer
+ INST_PTR_HI : upper 32 bit of instance block pointer
+ INST_TARGET (TGI) : aperture of the instance block
+
+ USERD_PTR_LO : upper 24 bits of the low 32 bits, of the 512-byte-aligned USERD pointer
+ USERD_PTR_HI : upper 32 bits of USERD pointer
+ USERD_TARGET (TGU) : aperture of the USERD data structure
+
+ CHID is a channel identifier that uniquely specifies the channel described
+by this runlist entry to the scheduling hardware and is reported in various
+status registers.
+ RUNQUEUE_SELECTOR determines to which runqueue the channel belongs, and
+thereby which PBDMA will run the channel. Increasing values select increasingly
+numbered PBDMA IDs serving the runlist. If the selector value exceeds the
+number of PBDMAs on the runlist, the hardware will silently reassign the channel
+to run on the first PBDMA as though RUNQUEUE_SELECTOR had been set to 0. (In
+current hardware, this is used by SCG on the graphics runlist only to determine
+which FE pipe should service a given channel. A value of 0 targets the first FE
+pipe, which can process all FE driven engines: Graphics, Compute, Inline2Memory,
+and TwoD. A value of 1 targets the second FE pipe, which can only process
+Compute work. Note that GRCE work is allowed on either runqueue.)
+ The INST fields specify the physical address of the channel's instance
+block, the in-memory data structure that stores the context state.
+The target aperture of the instance block is given by INST_TARGET, and the byte
+offset within that aperture is calculated as
+
+ (INST_PTR_HI << 32) | (INST_PTR_LO << NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT)
+
+This address should match the one specified in the channel RAM's
+NV_PCCSR_CHANNEL_INST register; see NV_RAMIN and NV_RAMFC for the format of the
+instance block. The hardware ignores the RAMRL INST fields, but in future
+chips the instance pointer may be removed from the channel RAM and the RAMRL
+INST fields used instead, resulting in smaller hardware.
+ The USERD fields specify the physical address of the USERD memory region
+used by software to submit additional work to the channel. The target aperture
+of the USERD region is given by USERD_TARGET, and the byte offset within that
+aperture is calculated as
+
+ (USERD_PTR_HI << 32) | (USERD_PTR_LO << NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT)
+
+
+SW uses the NV_RAMUSERD_CHAN_SIZE define to allocate and align a channel's
+RAMUSERD data structure. See the documentation for NV_RAMUSERD for a
+description of the use of USERD and its format. This address and it's
+alignment must match the one specified in the RAMFC's NV_RAMFC_USERD and
+NV_RAMFC_USERD_HI fields which are backed by NV_PPBDMA_USERD in dev_pbdma.ref.
+The hardware ignores the RAMRL USERD fields, but in future chips the USERD
+pointer may be read from these fields in the runlist entry instead of the RAMFC
+to avoid the extra level of indirection in fetching the USERD data that
+currently results in a dependent read.
+
+
+Runlist TSG Entry Type:
+
+ The other type of runlist entry is Timeslice Group (TSG) header entry
+(Fig 5.2). This type of entry is specified by NV_RAMRL_ENTRY_TYPE_TSG. A TSG
+entry describes a collection of channels all of which share the same context and
+are scheduled as a single unit by Host. All runlists support this type of entry.
+
+ The fields available in a TSG header runlist entry are as follows (Fig 5.2):
+
+ ENTRY_TYPE (T) : type of this entry: ENTRY_TYPE_TSG
+ TSGID : identifier of the Timeslice group (overlays ENTRY_ID)
+ TSG_LENGTH : number of channels that are part of this timeslice group
+ TIMESLICE_SCALE : scale factor for the TSG's timeslice
+ TIMESLICE_TIMEOUT : timeout amount for the TSG's timeslice
+
+ A timeslice group entry consists of an integer identifier along with a
+length which specifies the number of channels in the TSG. After a TSG header
+runlist entry, the next TSG_LENGTH runlist entries are considered to be part of
+the timeslice group. Note that the minimum length of a TSG is at least one entry.
+ All channels in a TSG share the same runlist timeslice which specifies how
+long a single context runs on an engine or PBDMA before being swapped for a
+different context. The timeslice period is set in the TSG header by specifying
+TSG_TIMESLICE_TIMEOUT and TSG_TIMESLICE_SCALE. The TSG timeslice period is
+calculated as follows:
+
+ timeslice = (TSG_TIMESLICE_TIMEOUT << TSG_TIMESLICE_SCALE) * 1024 nanoseconds
+
+ The timeslice period should normally not be set to zero. A timeslice of
+zero will be treated as a timeslice period of one . The runlist
+timeslice period begins after the context has been loaded on a PBDMA but is
+paused while the channel has an outstanding context load to an engine. Time
+spent switching a context into an engine is not part of the runlist timeslice.
+
+ If Host reaches the end of the runlist or receives another entry of type
+NV_RAMRL_ENTRY_TYPE_TSG before processing TSG_LENGTH additional runlist entries,
+or if it encounters a TSG of length 0, a SCHED_ERROR interrupt will be generated
+with ERROR_CODE_BAD_TSG.
+
+
+Host Scheduling Memory Layout:
+
+Example of graphics runlist entry to GPU context mapping via channel id:
+
+
+ .------Ints_ptr -------.
+ | |
+ Graphics Runlist | Channel-Map RAM | GPU Instance Block
+ .------------ . | .----------------. | .-------------------.
+ | TSG Hdr L=m |--.----' |Ch0 Inst Blk Ptr|--'------->| Host State |
+ | RL Entry T1 | | |Ch1 Inst Blk Ptr| .------| Memory State |
+ | RL Entry T2 | | | ... | | | Engine0 State Ptr |
+ | ... | |-chid->|ChI Inst Blk Ptr| | | Engine1 State Ptr |
+ | RL Entry Tm | | | ... | | | ... |
+ | TSG Hdr L=n | | |ChN Inst Blk Ptr| | .-| EngineN State Ptr |
+ | RL Entry T1 | | `----------------' | | `-------------------'
+ | RL Entry T2 |userd_ptr | |
+ | ... | | .--------------. | | .--------------.
+ | RL Entry Tn | | | USERD | | | | Engine Ctx |
+ | | '------->| |<----' '-->| State N |
+ `-------------' | | | |
+ `--------------' `--------------'
+
+Runlist Diagram Description:
+ Here we have (M+N) number of channel type (ENTRY_TYPE_CHID) runlist entries
+grouped together within two TSGs. The first entry in the runlist is a TSG header
+entry (ENTRY_TYPE_TSG) that describes the first TSG. The TSG header specifies m
+as the length of the TSG. The header would also contain the timeslice
+information for the TSG (SCALE/TIMEOUT), as well as the TSG id specified in the
+TSGID field.
+ Because the length here is M, the Runlist *must* contain M additional
+runlist entries of type ENTRY_TYPE_CHAN that will be part of this TSG.
+Similarly, the next (N+1) number of entries, a TSG header entry followed by N
+number of regular channel entry, correspond to the second TSG.
+
+#define NV_RAMRL_ENTRY /* ----G */
+#define NV_RAMRL_ENTRY_RANGE 0xF:0x00000000 /* RW--M */
+#define NV_RAMRL_ENTRY_SIZE 16 /* */
+// Runlist base must be 4k-aligned.
+#define NV_RAMRL_ENTRY_BASE_SHIFT 12 /* */
+
+
+#define NV_RAMRL_ENTRY_TYPE (0+0*32):(0+0*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_TYPE_CHAN 0x00000000 /* RW--V */
+#define NV_RAMRL_ENTRY_TYPE_TSG 0x00000001 /* RW--V */
+
+#define NV_RAMRL_ENTRY_ID (11+2*32):(0+2*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_ID_HW 11:0 /* RWXUF */
+#define NV_RAMRL_ENTRY_ID_MAX (4096-1) /* RW--V */
+
+
+
+
+
+#define NV_RAMRL_ENTRY_CHAN_RUNQUEUE_SELECTOR (1+0*32):(1+0*32) /* RWXUF */
+
+#define NV_RAMRL_ENTRY_CHAN_INST_TARGET (5+0*32):(4+0*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_VID_MEM 0x00000000 /* RW--V */
+#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
+#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
+
+#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET (7+0*32):(6+0*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM 0x00000000 /* RW--V */
+#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM_NVLINK_COHERENT 0x00000001 /* RW--V */
+#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
+#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
+
+#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_LO (31+0*32):(8+0*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_HI (31+1*32):(0+1*32) /* RWXUF */
+
+#define NV_RAMRL_ENTRY_CHAN_CHID (11+2*32):(0+2*32) /* RWXUF */
+
+#define NV_RAMRL_ENTRY_CHAN_INST_PTR_LO (31+2*32):(12+2*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_CHAN_INST_PTR_HI (31+3*32):(0+3*32) /* RWXUF */
+
+
+
+// Macros for shifting out low bits of INST_PTR and USERD_PTR.
+#define NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT 12 /* ----C */
+#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT 8 /* ----C */
+
+
+
+
+
+
+
+#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE (19+0*32):(16+0*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE_3 0x00000003 /* RWI-V */
+#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT (31+0*32):(24+0*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_128 0x00000080 /* RWI-V */
+
+
+#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_1US 0x00000000 /* */
+
+#define NV_RAMRL_ENTRY_TSG_LENGTH (7+1*32):(0+1*32) /* RWXUF */
+#define NV_RAMRL_ENTRY_TSG_LENGTH_INIT 0x00000000 /* RW--V */
+#define NV_RAMRL_ENTRY_TSG_LENGTH_MIN 0x00000001 /* RW--V */
+#define NV_RAMRL_ENTRY_TSG_LENGTH_MAX 0x00000080 /* RW--V */
+
+#define NV_RAMRL_ENTRY_TSG_TSGID (11+2*32):(0+2*32) /* RWXUF */
+
+
+
+6 - Host Pushbuffer Format (FIFO_DMA)
+=======================================
+
+ "FIFO" refers to Host. "FIFO_DMA" means data that Host reads from memory:
+the pushbuffer. Host autonomously reads pushbuffer data from memory and
+generates method address/data pairs from the data.
+
+ Pushbuffer terminology:
+
+ - A channel is the logical sequence of instructions associated with a GPU
+ context.
+
+ - The pushbuffer is a stream of data in memory containing the
+ specifications of the operations that a channel is to perform for a
+ particular client. Pushbuffer data consists of pushbuffer entries.
+
+ - A pushbuffer entry (PB entry) is a 32-bit (doubleword) sized unit of
+ pushbuffer data. This is the smallest granularity at which Host consumes
+ pushbuffer data. A PB entry is either a PB instruction (which is either
+ a PB control entry or a PB method header), or a method data entry.
+
+ - A pushbuffer segment (PB segment) is a contiguous block of memory
+ containing pushbuffer entries. The location and size of a pushbuffer
+ segment is defined by its respective GP entry in the GPFIFO.
+
+ - A pushbuffer control entry (PB control entry) is a single PB entry of
+ type SET_SUBDEVICE_MASK, STORE_SUBDEVICE_MASK, USE_SUBDEVICE_MASK,
+ END_PB_SEGMENT, or a universal NOP (NV_FIFO_DMA_NOP).
+
+ - A pushbuffer compressed method sequence is a sequence of pushbuffer
+ entries starting with a method header and a variable-length sequence of
+ method data entries (the length being defined by the method header). A
+ single PB compressed method sequence expands into one or more methods.
+ This may also be known as a "pushbuffer method" (PB method), but that
+ terminology is ambiguous and not preferred.
+
+ - A pushbuffer method header (PB method header) is the first PB entry found
+ in a PB compressed method sequence. A PB method header is a PB
+ instruction performed on method data entries.
+
+ - A pushbuffer instruction (PB instruction) is a PB entry that is not a PB
+ method data entry. A PB instruction is either a PB control entry or a PB
+ method header.
+
+ - A method is an address/data pair representing an operation to perform.
+
+ - A method data entry is the 32-bit operand for its corresponding method.
+
+
+
+#define NV_FIFO_PB_ENTRY_SIZE 4 /* */
+
+
+ Some engines such as Graphics internally support a double-wide method FIFO;
+these are known as "data-hi" methods. It is Host that performs the packing of
+two methods into one double-wide entry. Host will only generate data-hi methods
+if the following conditions are satisfied:
+
+ 1. The two methods come from the same PB method (in other words they share
+ the same method header).
+
+ 2. The method header specifies a non-incrementing method, an incrementing
+ method, or an increment-once method.
+
+ 3. The paired methods either have the same method address, or the first
+ method has an even NV_FIFO_DMA_METHOD_ADDRESS field and the second
+ (data-hi) method is the increment of the first. (That is, the
+ left-shifted method address as listed in the class files must be
+ divisible by 8 for this condition to hold.)
+
+ 4. The second method is available at the time of pushing the first one into
+ the engine's method FIFO. In other words, Host will not wait to pack
+ methods. Note that if the engine's method fifo is full, the
+ back-pressure will in itself create a "wait time".
+
+The first three conditions are under SW's control. Only the graphics engine
+supports data-hi methods.
+
+
+Types of PB Entries
+
+ PB entries can be classified into three types: PB method headers, PB
+control entries, and PB method data. Different types of PB entries have
+different formats. Because PB compressed method sequences are of variable
+length, it is impossible to determine the type of a PB entry without tracking
+the pushbuffer from the beginning or from the location of a PB entry that is
+known to not be a PB method data entry.
+
+ A PB method data entry is always found in a method data sequence
+immediately following a PB method header in the logical stream of PB entries.
+The PB method header contains a NV_FIFO_DMA_METHOD_COUNT field, the value of
+which is equal to the length of the method data sequence. Note a PB method
+header does not necessarily come with PB method data entries (see details below
+about immediate-data method headers and method headers for which COUNT is zero).
+Also note the PB method data entries may be located in a PB segment separate
+from their corresponding method header. The format of any given PB method data
+entry is defined in the "NV_UDMA" section of dev_pbdma.ref.
+
+ A PB entry that is either a PB method header or PB control entry is known
+as a PB instruction. The type of a PB instruction is specified by the
+NV_FIFO_DMA_SEC_OP field and the NV_FIFO_DMA_TERT_OP field.
+
+ secondary tertiary
+ opcode opcode entry type
+ --------- -------- --------------------------------
+ 000 01 SET_SUBDEVICE_MASK
+ 000 10 STORE_SUBDEVICE_MASK
+ 000 11 USE_SUBDEVICE_MASK
+ 001 xx incrementing method header
+ 011 xx non-incrementing method header
+ 100 xx immediate-data method header
+ 101 xx increment-once method header
+ 111 xx END_PB_SEGMENT
+ --------- -------- --------------------------------
+
+ Types of methods:
+
+ - A Host method is a method whose address is defined in the NV_UDMA device
+ range.
+
+ - A Host-only method is any Host method excluding SetObject (also known as
+ NV_UDMA_OBJECT).
+
+ - An engine method is a method whose address is not defined within the
+ NV_UDMA device range. There are multiple engines designated by a
+ subchannel ID. Software methods are included in this category.
+
+ - A software method (SW method) is a method which causes an interrupt for
+ the express purpose of being handled by software. For details see the
+ section on software methods below.
+
+ For more information about types of methods see "HOST METHODS" and
+"RESERVED METHOD ADDRESSES" in dev_pbdma.ref.
+
+ The method address in a PB method header (stored in the
+NV_FIFO_DMA_METHOD_ADDRESS field) is a dword-address, not a byte-address. In
+other words the least significant two bits of the address are not stored because
+the byte-address is dword-aligned (thus the least significant two bits are
+always zero).
+
+ The subchannel in a PB method header (stored in the
+NV_FIFO_DMA_*_SUBCHANNEL field) determines the engine to which a method will be
+sent if the method is SetObject or an engine method (otherwise, the SUBCHANNEL
+field is ignored). SetObject enables SW to request HW to check the expectation
+that a given subchannel serves the specified class ID; see the description of
+"NV_UDMA_OBJECT" in dev_pbdma.ref.
+
+ The mapping between subchannels and engines is fixed. A subchannel is
+bound to a given class according to the runlist. Each engine method is applied
+to an "object," which itself is an instance of an NV class as defined by the
+master MFS class files. Each object belongs to an engine. For SetObject and
+engine methods, the engine is determined entirely by the SUBCHANNEL field of
+the method's header via a fixed mapping that depends on the runlist on which the
+method arrives.
+
+ Methods on subchannels 0-4 are handled by the primary engine served by the
+runlist, except that subchannel 4 targets GRCOPY0 and GRCOPY1 on the graphics
+runlist. For Graphics/Compute, SetObject associates subchannels 0, 1, 2, and 3
+with class identifiers for 3D, compute, I2M, and 2D respectively. On other
+runlists, the subchannel is ignored, and Host does not send the subchannel ID to
+the engine. It is recommended that SW only use subchannel 4 on the dedicated
+copy engines for consistency with GRCOPY usage.
+
+ Subchannels 5-7 are for software methods. Any methods on these subchannels
+(including SetObject methods) are kicked back to software for handling via the
+SW method dispatch mechanism using the NV_PPBDMA_INTR_*_DEVICE interrupt. SW
+may choose to send a SetObject method to each engine subchannel before sending
+any methods on that particular subchannel in order to support multiple software
+classes.
+
+ If a method stream subchannel-switches from targeting graphics/compute to a
+copy engine or vice-versa, that is, to or from subchannel 4 on GR, Host will:
+
+ 1. Wait until the first engine has completed all its methods,
+
+ 2. Wait until that engine indicates that it is idle (WFI), and
+
+ 3. Send a sysmem barrier flush and wait until it completes.
+
+Only then will Host send methods to the newly targeted engine.
+
+ Note that this WFI will not occur for sending Host-only methods on the new
+subchannel, since Host-only methods ignore the subchannel field. Additionally,
+when switching from CE to graphics/compute, Host forces FE to perform a cache
+invalidate. Other subchannel switch semantics may be provided by the engines
+themselves, such as switching between subchannels 0-3 within FE.
+
+
+#define NV_FIFO_DMA /* ----G */
+#define NV_FIFO_DMA_METHOD_ADDRESS_OLD 12:2 /* RWXUF */
+#define NV_FIFO_DMA_METHOD_ADDRESS 11:0 /* RWXUF */
+
+#define NV_FIFO_DMA_SUBDEVICE_MASK 15:4 /* RWXUF */
+
+#define NV_FIFO_DMA_METHOD_SUBCHANNEL 15:13 /* RWXUF */
+
+#define NV_FIFO_DMA_TERT_OP 17:16 /* RWXUF */
+#define NV_FIFO_DMA_TERT_OP_GRP0_SET_SUB_DEV_MASK 0x00000001 /* RW--V */
+#define NV_FIFO_DMA_TERT_OP_GRP0_STORE_SUB_DEV_MASK 0x00000002 /* RW--V */
+#define NV_FIFO_DMA_TERT_OP_GRP0_USE_SUB_DEV_MASK 0x00000003 /* RW--V */
+
+#define NV_FIFO_DMA_METHOD_COUNT_OLD 28:18 /* RWXUF */
+#define NV_FIFO_DMA_METHOD_COUNT 28:16 /* RWXUF */
+#define NV_FIFO_DMA_IMMD_DATA 28:16 /* RWXUF */
+
+#define NV_FIFO_DMA_SEC_OP 31:29 /* RWXUF */
+#define NV_FIFO_DMA_SEC_OP_GRP0_USE_TERT 0x00000000 /* RW--V */
+#define NV_FIFO_DMA_SEC_OP_INC_METHOD 0x00000001 /* RW--V */
+#define NV_FIFO_DMA_SEC_OP_NON_INC_METHOD 0x00000003 /* RW--V */
+#define NV_FIFO_DMA_SEC_OP_IMMD_DATA_METHOD 0x00000004 /* RW--V */
+#define NV_FIFO_DMA_SEC_OP_ONE_INC 0x00000005 /* RW--V */
+#define NV_FIFO_DMA_SEC_OP_RESERVED6 0x00000006 /* RW--V */
+#define NV_FIFO_DMA_SEC_OP_END_PB_SEGMENT 0x00000007 /* RW--V */
+
+
+Incrementing PB Method Header Format
+
+ An incrementing PB method header specifies that Host generate a sequence of
+methods. The length of the sequence is defined by the method header. The
+method data for each method in this sequence is found in a sequence of PB
+entries immediately following the method header.
+
+ The dword-address of the first method is specified by the method header,
+and the dword-address of each subsequent method is equal to the dword-address of
+the previous method plus one. Or in other words, the byte-address of each
+subsequent method is equal to the byte-address of the previous method plus four.
+
+Example sequence of methods generated from an incrementing method header:
+
+ addr data0
+ addr+1 data1
+ addr+2 data2
+ addr+3 data3
+ ... ...
+
+ The NV_FIFO_DMA_INCR_COUNT field contains the number of methods in the
+generated sequence. This is the same as the number of method data entries that
+follow the method header. If the COUNT field is zero, the other fields are
+ignored, and the PB method effectively becomes a no-op with no method data
+entries following it.
+
+ The NV_FIFO_DMA_INCR_SUBCHANNEL field contains the subchannel to use for
+the methods generated from the method header. See the documentation above for
+NV_FIFO_DMA_*_SUBCHANNEL.
+
+ The NV_FIFO_DMA_INCR_ADDRESS field contains the method address for the
+first method in the generated sequence. The dword-address of the method is
+incremented by one each time a method is generated. A method address specifies
+an operation to be performed. Note that because the ADDRESS is a dword-address
+and not a byte-address, the least two significant bits of the method's
+byte-address are not stored.
+
+ The NV_FIFO_DMA_INCR_DATA fields contain the method data for the methods in
+the generated sequence. The number of method data entries is defined by the
+COUNT field. A method data entry contains an operand for its respective method.
+
+ Bit 12 is reserved for the future expansion of either the subchannel or the
+address fields.
+
+
+#define NV_FIFO_DMA_INCR /* ----G */
+#define NV_FIFO_DMA_INCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */
+#define NV_FIFO_DMA_INCR_OPCODE_VALUE 0x00000001 /* ----V */
+#define NV_FIFO_DMA_INCR_COUNT (0*32+28):(0*32+16) /* RWXUF */
+#define NV_FIFO_DMA_INCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */
+#define NV_FIFO_DMA_INCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */
+#define NV_FIFO_DMA_INCR_DATA (1*32+31):(1*32+0) /* RWXUF */
+
+
+Non-Incrementing PB Method Header Format
+
+ A non-incrementing PB method header specifies that Host generate a sequence
+of methods. The length of the sequence is defined by the method header. The
+method data for each method in this sequence is contained within the PB entries
+immediately following the method header.
+
+ Unlike with the incrementing PB method header, the sequence of methods
+generated all have the same method address. The dword-address of every method
+in this sequence is specified by the method header. Although the methods all
+have the same address, the method data entries may be different.
+
+Example sequence of methods generated from a non-incrementing method header:
+
+ addr data0
+ addr data1
+ addr data2
+ addr data3
+ ... ...
+
+ The NV_FIFO_DMA_NONINCR_COUNT field contains the number of methods
+in the generated sequence. This is the same as the number of method data
+entries that follow the method header. If the COUNT field is zero, the other
+fields are ignored, and the PB method effectively becomes a no-op with no method
+data entries following it.
+
+ The NV_FIFO_DMA_NONINCR_SUBCHANNEL field contains the subchannel to use for
+the methods generated from the method header. See the documentation above for
+NV_FIFO_DMA_*_SUBCHANNEL.
+
+ The NV_FIFO_DMA_NONINCR_ADDRESS field contains the method address for every
+method in the generated sequence. A method address specifies an operation to be
+performed. Note that because the ADDRESS field is a dword-address and not a
+byte-address, the least two significant bits of the method's byte-address are
+not stored.
+
+ The NV_FIFO_DMA_NONINCR_DATA fields contain the method data for the methods
+in the generated sequence. The number of method data entries is defined by the
+COUNT field. A method data entry contains an operand for its respective method.
+
+ Bit 12 is reserved for the future expansion of either the subchannel or the
+address fields.
+
+
+#define NV_FIFO_DMA_NONINCR /* ----G */
+#define NV_FIFO_DMA_NONINCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */
+#define NV_FIFO_DMA_NONINCR_OPCODE_VALUE 0x00000003 /* ----V */
+#define NV_FIFO_DMA_NONINCR_COUNT (0*32+28):(0*32+16) /* RWXUF */
+#define NV_FIFO_DMA_NONINCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */
+#define NV_FIFO_DMA_NONINCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */
+#define NV_FIFO_DMA_NONINCR_DATA (1*32+31):(1*32+0) /* RWXUF */
+
+
+Increment-Once PB Method Header Format
+
+ An increment-once PB method header specifies that Host generate a sequence
+of methods. The length of the sequence is defined by the method header. The
+method data for each method in this sequence is found in a sequence of PB
+entries immediately following the method header.
+
+ The dword-address of the first method is specified by the method header.
+The address of the second and all following methods is equal to the
+dword-address of the first method plus one. In other words, the byte-address of
+the second and all following methods is equal to the byte-address of the first
+method plus four.
+
+Example sequence of methods generated from an increment-once method header:
+
+ addr data0
+ addr+1 data1
+ addr+1 data2
+ addr+1 data3
+ ... ...
+
+ The NV_FIFO_DMA_ONEINCR_COUNT field contains the number of methods in the
+generated sequence. This is the same as the number of method data entries that
+follow the method header. If the COUNT field is zero, the other fields are
+ignored, and the PB method effectively becomes a no-op method with no method
+data entries following it.
+
+ The NV_FIFO_DMA_ONEINCR_SUBCHANNEL field contains the subchannel to use for
+the methods generated from the method header. See the documentation above for
+NV_FIFO_DMA_*_SUBCHANNEL.
+
+ The NV_FIFO_DMA_ONEINCR_ADDRESS field contains the method address for the
+first method in the generated sequence. A method address specifies an operation
+to be performed. Note that because the ADDRESS is a dword-address and not a
+byte-address, the least two significant bits of the method's byte-address are
+not stored.
+
+ The NV_FIFO_DMA_ONEINCR_DATA fields contain the method data for the methods
+in the generated sequence. The number of method data entries is defined by the
+COUNT field. A method data entry contains an operand for its respective method.
+
+ Bit 12 is reserved for the future expansion of either the subchannel or the
+address fields.
+
+
+#define NV_FIFO_DMA_ONEINCR /* ----G */
+#define NV_FIFO_DMA_ONEINCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */
+#define NV_FIFO_DMA_ONEINCR_OPCODE_VALUE 0x00000005 /* ----V */
+#define NV_FIFO_DMA_ONEINCR_COUNT (0*32+28):(0*32+16) /* RWXUF */
+#define NV_FIFO_DMA_ONEINCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */
+#define NV_FIFO_DMA_ONEINCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */
+#define NV_FIFO_DMA_ONEINCR_DATA (1*32+31):(1*32+0) /* RWXUF */
+
+
+No-Operation PB Instruction Formats
+
+ The method header for a no-op PB method may be specified in multiple ways,
+but the preferred way is to set the PB instruction to NV_FIFO_DMA_NOP.
+In any case NV_FIFO_DMA_NOP is a universal NOP entry that bypasses any method
+header format check, and is not considered a method header.
+
+
+#define NV_FIFO_DMA_NOP 0x00000000 /* ----C */
+
+
+Immediate-Data PB Method Header Format
+
+ If a method's operand fits within 13 bits, a PB method may be specified in
+a single PB entry, using the immediate-data PB method header format. Exactly
+one method is generated from this method header.
+
+ The NV_FIFO_DMA_IMMD_SUBCHANNEL field contains the subchannel to use for
+the method generated from the method header. See the documentation above for
+NV_FIFO_DMA_*_SUBCHANNEL.
+
+ The NV_FIFO_DMA_IMMD_ADDRESS field contains the method address for the
+single generated method. A method address specifies an operation to be
+performed. Note that because the ADDRESS is a dword-address and not a
+byte-address, the least two significant bits of the method's byte-address are
+not stored.
+
+ The single NV_FIFO_DMA_IMMD_DATA field contains the method data for the
+generated method. This method data contains an operand for the generated
+method.
+
+
+#define NV_FIFO_DMA_IMMD /* ----G */
+#define NV_FIFO_DMA_IMMD_ADDRESS 11:0 /* RWXUF */
+#define NV_FIFO_DMA_IMMD_SUBCHANNEL 15:13 /* RWXUF */
+#define NV_FIFO_DMA_IMMD_DATA 28:16 /* RWXUF */
+#define NV_FIFO_DMA_IMMD_OPCODE 31:29 /* RWXUF */
+#define NV_FIFO_DMA_IMMD_OPCODE_VALUE 0x00000004 /* ----V */
+
+
+Set Sub-Device Mask PB Control Entry Format
+
+ The SET_SUBDEVICE_MASK (SSDM) PB control entry is used when multiple GPU
+contexts are using the same pushbuffer (for example, for SLI or for stereo
+rendering) and there is data in the push buffer that is for only a subset of the
+GPU contexts. This instruction allows the pushbuffer to tell a specific GPU
+context to use or ignore methods following the SET_SUBDEVICE_MASK. While the
+logical-AND of NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE and the GPU context's
+NV_PPBDMA_SUBDEVICE_ID value is zero, methods are ignored. Pushbuffer control
+entries (like SET_SUBDEVICE_MASK) are not ignored.
+
+********************************************************************************
+Warning: When using subdevice masking, one must take care to synchronize
+properly with any later GP entries marked FETCH_CONDITIONAL. If GP fetching
+gets too far ahead of PB processing, it is possible for a later conditional PB
+segment to be discarded prior to reaching an SSDM command that sets
+SUBDEVICE_STATUS to ACTIVE. This would cause Host to execute garbage data. One
+way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL
+segments following a subdevice reenable.
+********************************************************************************
+
+
+
+#define NV_FIFO_DMA_SET_SUBDEVICE_MASK /* ----G */
+#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE 15:4 /* RWXUF */
+#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */
+#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE_VALUE 0x00000001 /* ----V */
+
+
+Store Sub-Device Mask PB Control Entry Format
+
+ The STORE_SUBDEVICE_MASK PB control entry is used to save a subdevice mask
+value to be used later by a USE_SUBDEVICE_MASK PB instruction.
+
+
+#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK /* ----G */
+#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_VALUE 15:4 /* RWXUF */
+#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */
+#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE_VALUE 0x00000002 /* ----V */
+
+
+Use Sub-Device Mask PB Control Entry Format
+
+ The USE_SUBDEVICE_MASK PB control entry is used to apply the subdevice mask
+value saved by a STORE_SUBDEVICE_MASK PB instruction. The effect of the mask is
+the same as for a SET_SUBDEVICE_MASK PB instruction.
+
+
+#define NV_FIFO_DMA_USE_SUBDEVICE_MASK /* ----G */
+#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */
+#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE_VALUE 0x00000003 /* ----V */
+
+
+End-PB-Segment PB Control Entry Format
+
+ Engines may write PB segments themselves, but they cannot write GP entries.
+Because they cannot write GP entries, they cannot alter the size of a PB
+segment. If an engine is writing a PB segment, and if it does not need to fill
+the entire PB segment it was allocated, instead of filling the remainder of the
+PB segment with no-op PB instructions, it may write a single End-PB-Segment
+control entry to indicate that the pushbuffer data contains no further valid
+data. No further PB entries from that PB segment will be decoded or processed.
+Host may have already issued requests to fetch the remainder of the PB segment
+before an End-PB-Segment PB instruction is processed. Host may or may not fetch
+the remainder of the PB segment. Also note that doing a PB CRC check on this
+segment via NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC will be indeterminate.
+
+
+#define NV_FIFO_DMA_ENDSEG_OPCODE 31:29 /* RWXUF */
+#define NV_FIFO_DMA_ENDSEG_OPCODE_VALUE 0x00000007 /* ----V */
+
+