summaryrefslogtreecommitdiff
path: root/Host-Fifo/volta/gv100/dev_ram.ref.txt
diff options
context:
space:
mode:
authorJohn Hubbard <jhubbard@nvidia.com>2019-06-12 14:41:51 -0700
committerJohn Hubbard <jhubbard@nvidia.com>2019-06-13 19:23:50 -0700
commitf9e4e0e07fd5a6a7757db977f69c8e91a0ae283f (patch)
tree1f9488efca18d52ccfc016c7531df4ceac94989c /Host-Fifo/volta/gv100/dev_ram.ref.txt
parent187a308aea3f133dfb27ebf6bafe75ffa15fc353 (diff)
downloadopen-gpu-doc-f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f.tar.xz
New ref manuals directory, delete old locations
As decided in a recent OpenSource-Approval meeting, we want the directory structure for reference manuals here to be fairly close to the way they are organized internal to NVIDIA. This CL therefore does the following: Rename from: Host-Fifo/volta/gv100/* Display-Ref-Manuals/gv100/* to: manuals/volta/gv100/* Regenerate index.html files to match (important for the "github pages" site, at https://nvidia.github.io/open-gpu-doc/ . Reviewed by: Maneet Singh
Diffstat (limited to 'Host-Fifo/volta/gv100/dev_ram.ref.txt')
-rw-r--r--Host-Fifo/volta/gv100/dev_ram.ref.txt1269
1 files changed, 0 insertions, 1269 deletions
diff --git a/Host-Fifo/volta/gv100/dev_ram.ref.txt b/Host-Fifo/volta/gv100/dev_ram.ref.txt
deleted file mode 100644
index e80d9c0..0000000
--- a/Host-Fifo/volta/gv100/dev_ram.ref.txt
+++ /dev/null
@@ -1,1269 +0,0 @@
-Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
-
-Permission is hereby granted, free of charge, to any person obtaining a
-copy of this software and associated documentation files (the "Software"),
-to deal in the Software without restriction, including without limitation
-the rights to use, copy, modify, merge, publish, distribute, sublicense,
-and/or sell copies of the Software, and to permit persons to whom the
-Software is furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
-THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
-DEALINGS IN THE SOFTWARE.
---------------------------------------------------------------------------------
-
-2 - GPU INSTANCE RAM (RAMIN)
-==============================
-
- A GPU contains a block called "XVE" that manages the interface with PCI, a
-block called "Host" that fetches graphics instructions, blocks called "engines"
-that execute graphics instructions, and blocks that manage the interface with
-memory.
-
- .-----. .------.
- | |<------------------>| |
- | | | |
- | | .---------. | |
- | |<--->| Engine1 |<---| |
- | | `---------' | |
-.---------. | | | |
-| GPU | | | .---------. | Host |
-| Local |<-->| FB |<--->| Engine2 |<---| |
-| Memory | | MMU | `---------' | |
-`---------' | Hub | ... | | .--------.
- | | .---------. | | | System |
- | |<--->| EngineN |<---| | | Memory |
- | | `---------' `------' `--------'
- | | ^ ^
- | | | |
-.---------. | | .--V--. PCI .--V--. .-----.
-| Display |<-->| |<------------------>| XVE |<--->| NB |<--->| CPU |
-`---------' `-----' `-----' `-----' `-----'
-
- A GPU context is a virtualization of the GPU for a particular software
-application. A GPU instance block is a block of memory that contains the state
-for a GPU context. A GPU context's instance block consists of Host state,
-pointers to each engine's state, and memory management state. A GPU instance
-block also contains a pointer to a block of memory that contains that part of a
-GPU context's state that a user-level driver may access. A GPU instance block
-fits within a single 4K-byte page of memory.
-
- Run List Channel-Map RAM
- .----------. Ch Id .----------------.
- | RL Entry0 |----. |Ch0 Inst Blk Ptr|
- | RL Entry1 | | |Ch1 Inst Blk Ptr|
- | RL Entry2 | | | ... |
- | ... | `--->|ChI Inst Blk Ptr|----.
- | RL EntryN | | ... | |
- `-----------' |ChN Inst Blk Ptr| |
- `----------------' |
- |
- .-----------------------------------------------'
- |
- | GPU Instance Block GPFIFO
- `-->.-----------------. GP_GET .--------. PB Seg
- | |------------------------------>|GP Entry| .--------.
- | Host State | |GP Entry|--->|PB Entry|
- | (RAMFC) | User-Driver State | | |PB Entry|
- | | .-------. |GP Entry| | ... |
- | |------------->|(USERD)| GP_PUT |GP Entry| |PB Entry|
- | | | |------->`--------' `--------'
- | | | |
- +-----------------+ | |
- | Memory | `-------'
- | Management |----------. Page Directory Page Table
- | State | | .-------. .-------.
- +-----------------+ `-->| PDE | | PTE |
- | Pointer to | | PDE |------->| PTE |
- | Engine0 |--------. | ... | | ... |
- | State | | | PDE | | PTE |
- +-----------------+ | `-------' `-------'
- | Pointer to | |
- | Engine1 |-----. | Engine0 State
- | State | | | .-------.
- +-----------------+ | `---->| |
- ... | `-------'
- +-----------------+ |
- | Pointer to | | Engine1 State
- | EngineN |--. | .-------.
- | State | | `------->| |
- `-----------------' | `-------'
- | ...
- |
- | EngineN State
- | .-------.
- `---------->| |
- `-------'
-
- The GPU context's Host state occupies the first 128 double words of an
-instance block. A GPU context's Host state is called "RAMFC". Please see
-the NV_RAMFC section below for a description of Host state.
-
- The GPU context's memory-management state defines the virtual address space
-that the GPU context uses. Memory management state consists of page and
-directory tables (that specify the mapping between virtual addresses and
-physical addresses, and the attributes of memory pages), and the limit of the
-virtual address space. The NV_RAMIN_PAGE_DIR_BASE entry contains the address of
-base of the GPU context's page directory table (PDB). NV_RAMIN_PAGE_DIR_BASE is
-4K-byte aligned.
-
- The NV_RAMIN_ENG*_WFI_PTR entry contains the address of a block of memory
-for storing an engine's context state. Blocks of memory that contain engine state
-are 4K-byte aligned. Only one engine context is supported per instance block.
-
- The NV_RAMIN_ENG*_CS field is deprecated, it was used to indicate whether
-GPU state should be restored from the FGCS pointer or from the WFI CS pointer.
-Engines only need/support one CTXSW pointer and all state is stored there
-whether a WFI CS or other form of preemption was performed. This field must
-always be set to WFI for legacy reasons, and will eventually be deleted.
-
-
-#define NV_RAMIN /* ----G */
-
-// The instance block must be 4k-aligned.
-#define NV_RAMIN_BASE_SHIFT 12 /* */
-
-// The instance block size fits within a single 4k block.
-#define NV_RAMIN_ALLOC_SIZE 4096 /* */
-
-// Host State
-#define NV_RAMIN_RAMFC (127*32+31):(0*32+0) /* RWXUF */
-
-// Memory-Management State
-
- The following fields are used for non-VEID engines. The NV_RAMIN_SC_* described later
- are used for VEID engines.
-
- NV_RAMIN_PAGE_DIR_BASE_TARGET determines if the top level of the page tables
- is in video memory or system memory (peer is not allowed), and the CPU cache
- coherency for system memory.
- Using INVALID, unbinds the selected engine.
-
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET (128*32+1):(128*32+0) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_VID_MEM 0x00000000 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_INVALID 0x00000001 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
-
- NV_RAMIN_PAGE_DIR_BASE_VOL identifies the volatile behavior
- of top level of the page table (whether local L2 can cache it or not).
-
-#define NV_RAMIN_PAGE_DIR_BASE_VOL (128*32+2):(128*32+2) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_VOL_TRUE 0x00000001 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_VOL_FALSE 0x00000000 /* RW--V */
-
-
- These bits specify whether the MMU will treats faults as replayable or not.
- The engine will send these bits to the MMU as part of the instance bind.
-
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX (128*32+4):(128*32+4) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED 0x00000000 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED 0x00000001 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC (128*32+5):(128*32+5) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED 0x00000000 /* RW--V */
-#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED 0x00000001 /* RW--V */
-
- NV_RAMIN_USE_NEW_PT_FORMAT determines which page table format to use.
- When NV_RAMIN_USE_NEW_PT_FORMAT is false, the page table uses the old format.
- When NV_RAMIN_USE_NEW_PT_FORMAT is true, the page table uses the new format.
-
- Volta only supports the new format. Selecting the old format results in an UNBOUND_INSTANCE fault.
-
-
-#define NV_RAMIN_USE_VER2_PT_FORMAT (128*32+10):(128*32+10) /* */
-#define NV_RAMIN_USE_VER2_PT_FORMAT_FALSE 0x00000000 /* */
-#define NV_RAMIN_USE_VER2_PT_FORMAT_TRUE 0x00000001 /* */
-
- When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit TRUE, the bit selects the big page size.
- When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit FALSE, NV_PFB_PRI_MMU_CTRL_VM_PG_SIZE selects the big page size.
-
- Volta only supports 64KB for big pages. Selecting 128KB for big pages results in an UNBOUND_INSTANCE fault.
-
-#define NV_RAMIN_BIG_PAGE_SIZE (128*32+11):(128*32+11) /* RWXUF */
-#define NV_RAMIN_BIG_PAGE_SIZE_128KB 0x00000000 /* RW--V */
-#define NV_RAMIN_BIG_PAGE_SIZE_64KB 0x00000001 /* RW--V */
-
- NV_RAMIN_PAGE_DIR_BASE_LO and NV_RAMIN_PAGE_DIR_BASE_HI
- identify the page directory base (start of the page table)
- location for this context.
-
-#define NV_RAMIN_PAGE_DIR_BASE_LO (128*32+31):(128*32+12) /* RWXUF */
-#define NV_RAMIN_PAGE_DIR_BASE_HI (129*32+31):(129*32+0) /* RWXUF */
-
-// Single engine pointer channels cannot support multiple
-// engines with CTXSW pointers
-#define NV_RAMIN_ENGINE_CS (132*32+3):(132*32+3) /* */
-#define NV_RAMIN_ENGINE_CS_WFI 0x00000000 /* */
-#define NV_RAMIN_ENGINE_CS_FG 0x00000001 /* */
-#define NV_RAMIN_ENGINE_WFI_TARGET (132*32+1):(132*32+0) /* */
-#define NV_RAMIN_ENGINE_WFI_TARGET_LOCAL_MEM 0x00000000 /* */
-#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_COHERENT 0x00000002 /* */
-#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* */
-#define NV_RAMIN_ENGINE_WFI_MODE (132*32+2):(132*32+2) /* */
-#define NV_RAMIN_ENGINE_WFI_MODE_PHYSICAL 0x00000000 /* */
-#define NV_RAMIN_ENGINE_WFI_MODE_VIRTUAL 0x00000001 /* */
-#define NV_RAMIN_ENGINE_WFI_PTR_LO (132*32+31):(132*32+12) /* */
-#define NV_RAMIN_ENGINE_WFI_PTR_HI (133*32+7):(133*32+0) /* */
-
-#define NV_RAMIN_ENGINE_WFI_VEID (134*32+(6-1)):(134*32+0) /* */
-#define NV_RAMIN_ENABLE_ATS (135*32+31):(135*32+31) /* RWXUF */
-#define NV_RAMIN_ENABLE_ATS_TRUE 0x00000001 /* RW--V */
-#define NV_RAMIN_ENABLE_ATS_FALSE 0x00000000 /* RW--V */
-#define NV_RAMIN_PASID (135*32+(20-1)):(135*32+0) /* RWXUF */
-
-
- Pointer to a method buffer in BAR2 memory where a faulted engine can save
-out methods. BAR2 accesses are assumed to be virtual, so the address saved here
-is a virtual address.
-
-#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_LO (136*32+31):(136*32+0) /* RWXUF */
-#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_HI (137*32+(((49-1)-32))):(137*32+0) /* RWXUF */
-
-
-
- These entries are used to inform FECS which of the below array of PDBs are
- valid/filled in and need to subsequently be bound.
-
- This needs to reserve at least NV_LITTER_NUM_SUBCTX entries. Currently
- there is enough space reserved for 64 subcontexts.
-#define NV_RAMIN_SC_PDB_VALID(i) (166*32+i):(166*32+i) /* RWXUF */
-#define NV_RAMIN_SC_PDB_VALID__SIZE_1 64 /* */
-#define NV_RAMIN_SC_PDB_VALID_FALSE 0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PDB_VALID_TRUE 0x00000001 /* RW--V */
-
-// Memory-Management VEID array
-
- The NV_RAMIN_SC_PAGE_DIR_BASE_* entries are an array of page table settings
- for each subcontext. When a context supports subcontexts, the page table
- information for a given VEID/Subcontext needs to be filled in or else page
- faults will result on access.
-
- These properties for the page table must be filled in for all channels
- sharing the same context as any channel's NV_RAMIN may be used to load the
- context.
-
- The non-subcontext page table information such as NV_RAMIN_PAGE_DIR_BASE*
- are used by non-subcontext engines and clients such as Host, CE, or the
- video engines.
-
- NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) determines if the top level of the page tables
- is in video memory or system memory (peer is not allowed), and the CPU cache
- coherency for system memory.
- Using INVALID, unbinds the selected subcontext.
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) ((168+(i)*4)*32+1):((168+(i)*4)*32+0) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET__SIZE_1 64 /* */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_VID_MEM 0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_INVALID 0x00000001 /* RW--V */ // Note: INVALID should match PEER
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
-
- NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) identifies the volatile behavior
- of the top level of the page table (whether local L2 can cache it or not).
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) ((168+(i)*4)*32+2):((168+(i)*4)*32+2) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL__SIZE_1 64 /* */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_TRUE 0x00000001 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_FALSE 0x00000000 /* RW--V */
-
- NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) and
- NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) bits specify whether
- the MMU will treats faults from TEX and GCC as replayable or
- not. Based on that fault packets are written into replayable fault
- buffer (or not) and faulting requests are put into replay request
- buffer (or not).
- The last bind that does not unbind a sub-context determines the REPLAY_TEX and REPLAY_GCC for all sub-contexts.
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) ((168+(i)*4)*32+4):((168+(i)*4)*32+4) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX__SIZE_1 64 /* */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED 0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED 0x00000001 /* RW--V */
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) ((168+(i)*4)*32+5):((168+(i)*4)*32+5) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC__SIZE_1 64 /* */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED 0x00000000 /* RW--V */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED 0x00000001 /* RW--V */
-
- NV_RAMIN_SC_USE_VER2_PT_FORMAT determines which page table format to use.
- When NV_RAMIN_SC_USE_VER2_PT_FORMAT is false, the page table uses
- the old format(2-level page table). When
- NV_RAMIN_SC_USE_VER2_PT_FORMAT is true, the page table uses the
- new format (5-level 49-bit VA format).
- The last bind that does not unbind a sub-context determines the page table format for all sub-contexts.
- Volta only supports the new format. Selecting the old format results in an UNBOUND_INSTANCE fault.
-
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT(i) ((168+(i)*4)*32+10):((168+(i)*4)*32+10) /* RWXUF */
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT__SIZE_1 64 /* */
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_FALSE 0x00000000 /* RW--V */
-#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_TRUE 0x00000001 /* RW--V */
-
- The last bind that does not unbind a sub-context determines the big page size for all sub-contexts.
- Volta only supports 64KB for big pages.
-
-#define NV_RAMIN_SC_BIG_PAGE_SIZE(i) ((168+(i)*4)*32+11):((168+(i)*4)*32+11) /* RWXUF */
-#define NV_RAMIN_SC_BIG_PAGE_SIZE__SIZE_1 64 /* */
-#define NV_RAMIN_SC_BIG_PAGE_SIZE_64KB 0x00000001 /* RW--V */
-
- NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) and NV_RAMIN_SC_PAGE_DIR_BASE_HI(i)
- identify the page directory base (start of the page table)
- location for subcontext i.
-
-#define NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) ((168+(i)*4)*32+31):((168+(i)*4)*32+12) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_LO__SIZE_1 64 /* */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_HI(i) ((169+(i)*4)*32+31):((169+(i)*4)*32+0) /* RWXUF */
-#define NV_RAMIN_SC_PAGE_DIR_BASE_HI__SIZE_1 64 /* */
-
-
-
-
-
- NV_RAMIN_SC_ENABLE_ATS(i) tells whether subcontext i is ATS
- enabled or not. In case, set to TRUE, GMMU will look for VA->PA
- translations into both GMMU and ATS page tables.
- ATS can be enabled or disabled per subcontext.
-
-#define NV_RAMIN_SC_ENABLE_ATS(i) ((170+(i)*4)*32+31):((170+(i)*4)*32+31) /* RWXUF */
-
- NV_RAMIN_SC_PASID(i) identifies the PASID (process address space
- ID) in CPU for subcontext i. PASID is used to get ATS
- translation when ATS page table lookup is needed. During ATS TLB
- shootdown, PASID is also used to match against the one coming with
- shootdown request.
-
-#define NV_RAMIN_SC_PASID(i) ((170+(i)*4)*32+(20-1)):((170+(i)*4)*32+0) /* RWXUF */
-
-
-
-
-3 - FIFO CONTEXT RAM (RAMFC)
-==============================
-
-
- The NV_RAMFC part of a GPU-instance block contains Host's part of a virtual
-GPU's state. Host is referred to as "FIFO". "FC" stands for FIFO Context.
-When Host switches from serving one GPU context to serving a second, Host saves
-state for the first GPU context to the first GPU context's RAMFC area, and loads
-state for the second GPU context from the second GPU context's RAMFC area.
-
- RAMFC is located at NV_RAMIN_RAMFC within the GPU instance block. In
-Kepler, this is at the start of the block. RAMFC is 4KB aligned.
-
- Every Host word entry in RAMFC directly corresponds to a PRI-accessible
-register. For a description of the contents of a RAMFC entry, please see the
-description of the corresponding register in "manuals/dev_pbdma.ref". The
-offsets of the fields within each entry in RAMFC match those of the
-corresponding register in the associated PBDMA unit's PRI space.
-
-
- RAMFC Entry PBDMA Register
- ------------------------------- ----------------------------------
- NV_RAMFC_SIGNATURE NV_PPBDMA_SIGNATURE(i)
- NV_RAMFC_GP_BASE NV_PPBDMA_GP_BASE(i)
- NV_RAMFC_GP_BASE_HI NV_PPBDMA_GP_BASE_HI(i)
- NV_RAMFC_GP_FETCH NV_PPBDMA_GP_FETCH(i)
- NV_RAMFC_GP_GET NV_PPBDMA_GP_GET(i)
- NV_RAMFC_GP_PUT NV_PPBDMA_GP_PUT(i)
- NV_RAMFC_PB_FETCH NV_PPBDMA_PB_FETCH(i)
- NV_RAMFC_PB_FETCH_HI NV_PPBDMA_PB_FETCH_HI(i)
- NV_RAMFC_PB_GET NV_PPBDMA_GET(i)
- NV_RAMFC_PB_GET_HI NV_PPBDMA_GET_HI(i)
- NV_RAMFC_PB_PUT NV_PPBDMA_PUT(i)
- NV_RAMFC_PB_PUT_HI NV_PPBDMA_PUT_HI(i)
- NV_RAMFC_PB_TOP_LEVEL_GET NV_PPBDMA_TOP_LEVEL_GET(i)
- NV_RAMFC_PB_TOP_LEVEL_GET_HI NV_PPBDMA_TOP_LEVEL_GET_HI(i)
- NV_RAMFC_GP_CRC NV_PPBDMA_GP_CRC(i)
- NV_RAMFC_PB_HEADER NV_PPBDMA_PB_HEADER(i)
- NV_RAMFC_PB_COUNT NV_PPBDMA_PB_COUNT(i)
- NV_RAMFC_PB_CRC NV_PPBDMA_PB_CRC(i)
- NV_RAMFC_SUBDEVICE NV_PPBDMA_SUBDEVICE(i)
- NV_RAMFC_METHOD0 NV_PPBDMA_METHOD0(i)
- NV_RAMFC_METHOD1 NV_PPBDMA_METHOD1(i)
- NV_RAMFC_METHOD2 NV_PPBDMA_METHOD2(i)
- NV_RAMFC_METHOD3 NV_PPBDMA_METHOD3(i)
- NV_RAMFC_DATA0 NV_PPBDMA_DATA0(i)
- NV_RAMFC_DATA1 NV_PPBDMA_DATA1(i)
- NV_RAMFC_DATA2 NV_PPBDMA_DATA2(i)
- NV_RAMFC_DATA3 NV_PPBDMA_DATA3(i)
- NV_RAMFC_TARGET NV_PPBDMA_TARGET(i)
- NV_RAMFC_METHOD_CRC NV_PPBDMA_METHOD_CRC(i)
- NV_RAMFC_REF NV_PPBDMA_REF(i)
- NV_RAMFC_RUNTIME NV_PPBDMA_RUNTIME(i)
- NV_RAMFC_SEM_ADDR_LO NV_PPBDMA_SEM_ADDR_LO(i)
- NV_RAMFC_SEM_ADDR_HI NV_PPBDMA_SEM_ADDR_HI(i)
- NV_RAMFC_SEM_PAYLOAD_LO NV_PPBDMA_SEM_PAYLOAD_LO(i)
- NV_RAMFC_SEM_PAYLOAD_HI NV_PPBDMA_SEM_PAYLOAD_HI(i)
- NV_RAMFC_SEM_EXECUTE NV_PPBDMA_SEM_EXECUTE(i)
- NV_RAMFC_ACQUIRE_DEADLINE NV_PPBDMA_ACQUIRE_DEADLINE(i)
- NV_RAMFC_ACQUIRE NV_PPBDMA_ACQUIRE(i)
- NV_RAMFC_MEM_OP_A NV_PPBDMA_MEM_OP_A(i)
- NV_RAMFC_MEM_OP_B NV_PPBDMA_MEM_OP_B(i)
- NV_RAMFC_MEM_OP_C NV_PPBDMA_MEM_OP_C(i)
- NV_RAMFC_USERD NV_PPBDMA_USERD(i)
- NV_RAMFC_USERD_HI NV_PPBDMA_USERD_HI(i)
- NV_RAMFC_HCE_CTRL NV_PPBDMA_HCE_CTRL(i)
- NV_RAMFC_CONFIG NV_PPBDMA_CONFIG(i)
- NV_RAMFC_SET_CHANNEL_INFO NV_PPBDMA_SET_CHANNEL_INFO(i)
- ------------------------------- ----------------------------------
-
-#define NV_RAMFC /* ----G */
-#define NV_RAMFC_GP_PUT (0*32+31):(0*32+0) /* RWXUF */
-#define NV_RAMFC_MEM_OP_A (1*32+31):(1*32+0) /* RWXUF */
-#define NV_RAMFC_USERD (2*32+31):(2*32+0) /* RWXUF */
-#define NV_RAMFC_USERD_HI (3*32+31):(3*32+0) /* RWXUF */
-#define NV_RAMFC_SIGNATURE (4*32+31):(4*32+0) /* RWXUF */
-#define NV_RAMFC_GP_GET (5*32+31):(5*32+0) /* RWXUF */
-#define NV_RAMFC_PB_GET (6*32+31):(6*32+0) /* RWXUF */
-#define NV_RAMFC_PB_GET_HI (7*32+31):(7*32+0) /* RWXUF */
-#define NV_RAMFC_PB_TOP_LEVEL_GET (8*32+31):(8*32+0) /* RWXUF */
-#define NV_RAMFC_PB_TOP_LEVEL_GET_HI (9*32+31):(9*32+0) /* RWXUF */
-#define NV_RAMFC_REF (10*32+31):(10*32+0) /* RWXUF */
-#define NV_RAMFC_RUNTIME (11*32+31):(11*32+0) /* RWXUF */
-#define NV_RAMFC_ACQUIRE (12*32+31):(12*32+0) /* RWXUF */
-#define NV_RAMFC_ACQUIRE_DEADLINE (13*32+31):(13*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_ADDR_HI (14*32+31):(14*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_ADDR_LO (15*32+31):(15*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_PAYLOAD_LO (16*32+31):(16*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_EXECUTE (17*32+31):(17*32+0) /* RWXUF */
-#define NV_RAMFC_GP_BASE (18*32+31):(18*32+0) /* RWXUF */
-#define NV_RAMFC_GP_BASE_HI (19*32+31):(19*32+0) /* RWXUF */
-#define NV_RAMFC_GP_FETCH (20*32+31):(20*32+0) /* RWXUF */
-#define NV_RAMFC_PB_FETCH (21*32+31):(21*32+0) /* RWXUF */
-#define NV_RAMFC_PB_FETCH_HI (22*32+31):(22*32+0) /* RWXUF */
-#define NV_RAMFC_PB_PUT (23*32+31):(23*32+0) /* RWXUF */
-#define NV_RAMFC_PB_PUT_HI (24*32+31):(24*32+0) /* RWXUF */
-#define NV_RAMFC_MEM_OP_B (25*32+31):(25*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED26 (26*32+31):(26*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED27 (27*32+31):(27*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED28 (28*32+31):(28*32+0) /* RWXUF */
-#define NV_RAMFC_GP_CRC (29*32+31):(29*32+0) /* RWXUF */
-#define NV_RAMFC_PB_HEADER (33*32+31):(33*32+0) /* RWXUF */
-#define NV_RAMFC_PB_COUNT (34*32+31):(34*32+0) /* RWXUF */
-#define NV_RAMFC_SUBDEVICE (37*32+31):(37*32+0) /* RWXUF */
-#define NV_RAMFC_PB_CRC (38*32+31):(38*32+0) /* RWXUF */
-#define NV_RAMFC_SEM_PAYLOAD_HI (39*32+31):(39*32+0) /* RWXUF */
-#define NV_RAMFC_MEM_OP_C (40*32+31):(40*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED20 (41*32+31):(41*32+0) /* RWXUF */
-#define NV_RAMFC_RESERVED21 (42*32+31):(42*32+0) /* RWXUF */
-#define NV_RAMFC_TARGET (43*32+31):(43*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD_CRC (44*32+31):(44*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD0 (48*32+31):(48*32+0) /* RWXUF */
-#define NV_RAMFC_DATA0 (49*32+31):(49*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD1 (50*32+31):(50*32+0) /* RWXUF */
-#define NV_RAMFC_DATA1 (51*32+31):(51*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD2 (52*32+31):(52*32+0) /* RWXUF */
-#define NV_RAMFC_DATA2 (53*32+31):(53*32+0) /* RWXUF */
-#define NV_RAMFC_METHOD3 (54*32+31):(54*32+0) /* RWXUF */
-#define NV_RAMFC_DATA3 (55*32+31):(55*32+0) /* RWXUF */
-#define NV_RAMFC_HCE_CTRL (57*32+31):(57*32+0) /* RWXUF */
-#define NV_RAMFC_CONFIG (61*32+31):(61*32+0) /* RWXUF */
-#define NV_RAMFC_SET_CHANNEL_INFO (63*32+31):(63*32+0) /* RWXUF */
-
-#define NV_RAMFC_BASE_SHIFT 12 /* */
-
- Size of the full range of RAMFC in bytes.
-#define NV_RAMFC_SIZE_VAL 0x00000200 /* ----C */
-
-4 - USER-DRIVER ACCESSIBLE RAM (RAMUSERD)
-=========================================
-
- A user-level driver is allowed to access only a small portion of a GPU
-context's state. The portion of a GPU context's state that a user-level driver
-can access is stored in a block of memory called NV_RAMUSERD. NV_RAMUSERD is a
-user-level driver's window into NV_RAMFC. The NV_RAMUSERD state for each GPU
-context is stored in an aligned NV_RAMUSERD_CHAN_SIZE-byte block of memory.
-
- To submit more methods, a user driver writes a PB segment to
-memory, writes a GP entry that points to the PB segment, updates GP_PUT in
-RAMUSERD, and writes the channel's handle to the
-NV_USERMODE_NOTIFY_CHANNEL_PENDING register (see dev_usermode.ref).
-
- The RAMUSERD data structure is updated at regular intervals as controlled
-by the NV_PFIFO_USERD_WRITEBACK setting (see dev_fifo.ref). For a particular
-channel, RAMUSERD writeback can be disabled and it is reccomended that SW track
-pushbuffer and channel progress via Host WFI_DIS semaphores rather than reading
-the RAMUSERD data structure.
-
- When write-back is enabled a user driver can check the GPU progress in
-executing a channel's PB segments. The driver can use:
- * GP_GET to monitor the index of the next GP entry the GPU will process
- * PB_GET to monitor the address of the next PB entry the GPU will process
- * TOP_LEVEL_GET (see NV_PPBDMA_TOP_LEVEL_GET) to monitor the address of the
- next "top-level" (non-SUBROUTINE) PB entry the GPU will process
- * REF to monitor the current "reference count" value see NV_PPBDMA_REF.
-
- Each entry in RAMUSERD corresponds to a PRI-accessible PBDMA register in Host.
-For a description of the behavior and contents of a RAMUSERD entry, please see
-the description of the corresponding register in "manuals/dev_pbdma.ref".
-
- RAMUSERD Entry PBDMA Register Access
- ------------------------------- ----------------------------- ----------
- NV_RAMUSERD_GP_PUT NV_PPBDMA_GP_PUT(i) Read/Write
- NV_RAMUSERD_GP_GET NV_PPBDMA_GP_GET(i) Read-only
- NV_RAMUSERD_GET NV_PPBDMA_GET(i) Read-only
- NV_RAMUSERD_GET_HI NV_PPBDMA_GET_HI(i) Read-only
- NV_RAMUSERD_PUT NV_PPBDMA_PUT(i) Read-only
- NV_RAMUSERD_PUT_HI NV_PPBDMA_PUT_HI(i) Read-only
- NV_RAMUSERD_TOP_LEVEL_GET NV_PPBDMA_TOP_LEVEL_GET(i) Read-only
- NV_RAMUSERD_TOP_LEVEL_GET_HI NV_PPBDMA_TOP_LEVEL_GET_HI(i) Read-only
- NV_RAMUSERD_REF NV_PPBDMA_REF(i) Read-only
- ------------------------------- ----------------------------- ----------
-
- A user driver may write to NV_RAMUSERD_GP_PUT to kick off more work in a
-channel. Although writes to the other, read-only, entries can alter memory,
-writes to those entries will not affect the operation of the GPU, and can be
-overwritten by the GPU.
-
- When Host loads its part of a GPU context's state from RAMFC memory, it
-may not immediately read RAMUSERD_GP_PUT. Host can use the GP_PUT values from
-RAMFC directly from RAMFC while waiting for the RAMUSERD_GP_PUT to synchronize.
-Because reads of RAMUSERD_GP_PUT can be delayed, the value in NV_PPBDMA_GP_PUT
-can be older than the value in NV_RAMUSERD_GP_PUT.
-
- When Host saves a GPU context's state to NV_RAMFC, it also writes to
-NV_RAMUSERD the values of the entries other than GP_PUT.
-Because Host does not continuously write the read-only RAMFC entries, the
-read-only values in USERD memory can be older than the values in the Host PBDMA
-unit.
-
-#define NV_RAMUSERD /* ----G */
-#define NV_RAMUSERD_PUT (16*32+31):(16*32+0) /* RWXUF */
-#define NV_RAMUSERD_GET (17*32+31):(17*32+0) /* RWXUF */
-#define NV_RAMUSERD_REF (18*32+31):(18*32+0) /* RWXUF */
-#define NV_RAMUSERD_PUT_HI (19*32+31):(19*32+0) /* RWXUF */
-#define NV_RAMUSERD_TOP_LEVEL_GET (22*32+31):(22*32+0) /* RWXUF */
-#define NV_RAMUSERD_TOP_LEVEL_GET_HI (23*32+31):(23*32+0) /* RWXUF */
-#define NV_RAMUSERD_GET_HI (24*32+31):(24*32+0) /* RWXUF */
-#define NV_RAMUSERD_GP_GET (34*32+31):(34*32+0) /* RWXUF */
-#define NV_RAMUSERD_GP_PUT (35*32+31):(35*32+0) /* RWXUF */
-#define NV_RAMUSERD_BASE_SHIFT 9 /* */
-#define NV_RAMUSERD_CHAN_SIZE 512 /* */
-
-
-
-
-5 - RUN-LIST RAM (RAMRL)
-========================
-
- Software specifies the GPU contexts that hardware should "run" by writing a
-list of entries (known as a "runlist") to a 4k-aligned area of memory (beginning
-at NV_PFIFO_RUNLIST_BASE), and by notifying Host that a new list is available
-(by writing to NV_PFIFO_RUNLIST).
- Submission of a new runlist causes Host to expire the timeslice of all work
-scheduled by the previous runlist, allowing it to schedule the channels present
-in the new runlist once they are fetched. SW can check the status of the runlist
-by polling NV_PFIFO_ENG_RUNLIST_PENDING. (see dev_fifo.ref NV_PFIFO_RUNLIST for
-a full description of the runlist submit mechanism).
- Runlists can be stored in system memory or video memory (as specified by
-NV_PFIFO_RUNLIST_BASE_TARGET). If a runlist is stored in video memory, software
-will have to execute flush or read the last entry written before submitting the
-runlist to Host to guarantee coherency .
- The size of a runlist entry data structure is 16 bytes. Each entry
-specifies either a channel entry or a TSG header; the type is determined by the
-NV_RAMRL_ENTRY_TYPE.
-
-
-Runlist Channel Entry Type:
-
- A runlist entry of type NV_RAMRL_ENTRY_TYPE_CHAN specifies a channel to
-run. All such entries must occur within the span of some TSG as specified by
-the NV_RAMRL_ENTRY_TYPE_TSG described below. If a channel entry is encountered
-outside a TSG, Host will raise the NV_PFIFO_INTR_SCHED_ERROR_CODE_BAD_TSG
-interrupt.
-
- The fields available in a channel runlist entry are as follows (Fig 5.1):
-
- ENTRY_TYPE (T) : type of this entry: ENTRY_TYPE_CHAN
- CHID (ID) : identifier of the channel to run (overlays ENTRY_ID)
- RUNQUEUE_SELECTOR (Q) : selects which PBDMA should run this channel if
- more than one PBDMA is supported by the runlist
-
- INST_PTR_LO : lower 20 bits of the 4k-aligned instance block pointer
- INST_PTR_HI : upper 32 bit of instance block pointer
- INST_TARGET (TGI) : aperture of the instance block
-
- USERD_PTR_LO : upper 24 bits of the low 32 bits, of the 512-byte-aligned USERD pointer
- USERD_PTR_HI : upper 32 bits of USERD pointer
- USERD_TARGET (TGU) : aperture of the USERD data structure
-
- CHID is a channel identifier that uniquely specifies the channel described
-by this runlist entry to the scheduling hardware and is reported in various
-status registers.
- RUNQUEUE_SELECTOR determines to which runqueue the channel belongs, and
-thereby which PBDMA will run the channel. Increasing values select increasingly
-numbered PBDMA IDs serving the runlist. If the selector value exceeds the
-number of PBDMAs on the runlist, the hardware will silently reassign the channel
-to run on the first PBDMA as though RUNQUEUE_SELECTOR had been set to 0. (In
-current hardware, this is used by SCG on the graphics runlist only to determine
-which FE pipe should service a given channel. A value of 0 targets the first FE
-pipe, which can process all FE driven engines: Graphics, Compute, Inline2Memory,
-and TwoD. A value of 1 targets the second FE pipe, which can only process
-Compute work. Note that GRCE work is allowed on either runqueue.)
- The INST fields specify the physical address of the channel's instance
-block, the in-memory data structure that stores the context state.
-The target aperture of the instance block is given by INST_TARGET, and the byte
-offset within that aperture is calculated as
-
- (INST_PTR_HI << 32) | (INST_PTR_LO << NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT)
-
-This address should match the one specified in the channel RAM's
-NV_PCCSR_CHANNEL_INST register; see NV_RAMIN and NV_RAMFC for the format of the
-instance block. The hardware ignores the RAMRL INST fields, but in future
-chips the instance pointer may be removed from the channel RAM and the RAMRL
-INST fields used instead, resulting in smaller hardware.
- The USERD fields specify the physical address of the USERD memory region
-used by software to submit additional work to the channel. The target aperture
-of the USERD region is given by USERD_TARGET, and the byte offset within that
-aperture is calculated as
-
- (USERD_PTR_HI << 32) | (USERD_PTR_LO << NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT)
-
-
-SW uses the NV_RAMUSERD_CHAN_SIZE define to allocate and align a channel's
-RAMUSERD data structure. See the documentation for NV_RAMUSERD for a
-description of the use of USERD and its format. This address and it's
-alignment must match the one specified in the RAMFC's NV_RAMFC_USERD and
-NV_RAMFC_USERD_HI fields which are backed by NV_PPBDMA_USERD in dev_pbdma.ref.
-The hardware ignores the RAMRL USERD fields, but in future chips the USERD
-pointer may be read from these fields in the runlist entry instead of the RAMFC
-to avoid the extra level of indirection in fetching the USERD data that
-currently results in a dependent read.
-
-
-Runlist TSG Entry Type:
-
- The other type of runlist entry is Timeslice Group (TSG) header entry
-(Fig 5.2). This type of entry is specified by NV_RAMRL_ENTRY_TYPE_TSG. A TSG
-entry describes a collection of channels all of which share the same context and
-are scheduled as a single unit by Host. All runlists support this type of entry.
-
- The fields available in a TSG header runlist entry are as follows (Fig 5.2):
-
- ENTRY_TYPE (T) : type of this entry: ENTRY_TYPE_TSG
- TSGID : identifier of the Timeslice group (overlays ENTRY_ID)
- TSG_LENGTH : number of channels that are part of this timeslice group
- TIMESLICE_SCALE : scale factor for the TSG's timeslice
- TIMESLICE_TIMEOUT : timeout amount for the TSG's timeslice
-
- A timeslice group entry consists of an integer identifier along with a
-length which specifies the number of channels in the TSG. After a TSG header
-runlist entry, the next TSG_LENGTH runlist entries are considered to be part of
-the timeslice group. Note that the minimum length of a TSG is at least one entry.
- All channels in a TSG share the same runlist timeslice which specifies how
-long a single context runs on an engine or PBDMA before being swapped for a
-different context. The timeslice period is set in the TSG header by specifying
-TSG_TIMESLICE_TIMEOUT and TSG_TIMESLICE_SCALE. The TSG timeslice period is
-calculated as follows:
-
- timeslice = (TSG_TIMESLICE_TIMEOUT << TSG_TIMESLICE_SCALE) * 1024 nanoseconds
-
- The timeslice period should normally not be set to zero. A timeslice of
-zero will be treated as a timeslice period of one . The runlist
-timeslice period begins after the context has been loaded on a PBDMA but is
-paused while the channel has an outstanding context load to an engine. Time
-spent switching a context into an engine is not part of the runlist timeslice.
-
- If Host reaches the end of the runlist or receives another entry of type
-NV_RAMRL_ENTRY_TYPE_TSG before processing TSG_LENGTH additional runlist entries,
-or if it encounters a TSG of length 0, a SCHED_ERROR interrupt will be generated
-with ERROR_CODE_BAD_TSG.
-
-
-Host Scheduling Memory Layout:
-
-Example of graphics runlist entry to GPU context mapping via channel id:
-
-
- .------Ints_ptr -------.
- | |
- Graphics Runlist | Channel-Map RAM | GPU Instance Block
- .------------ . | .----------------. | .-------------------.
- | TSG Hdr L=m |--.----' |Ch0 Inst Blk Ptr|--'------->| Host State |
- | RL Entry T1 | | |Ch1 Inst Blk Ptr| .------| Memory State |
- | RL Entry T2 | | | ... | | | Engine0 State Ptr |
- | ... | |-chid->|ChI Inst Blk Ptr| | | Engine1 State Ptr |
- | RL Entry Tm | | | ... | | | ... |
- | TSG Hdr L=n | | |ChN Inst Blk Ptr| | .-| EngineN State Ptr |
- | RL Entry T1 | | `----------------' | | `-------------------'
- | RL Entry T2 |userd_ptr | |
- | ... | | .--------------. | | .--------------.
- | RL Entry Tn | | | USERD | | | | Engine Ctx |
- | | '------->| |<----' '-->| State N |
- `-------------' | | | |
- `--------------' `--------------'
-
-Runlist Diagram Description:
- Here we have (M+N) number of channel type (ENTRY_TYPE_CHID) runlist entries
-grouped together within two TSGs. The first entry in the runlist is a TSG header
-entry (ENTRY_TYPE_TSG) that describes the first TSG. The TSG header specifies m
-as the length of the TSG. The header would also contain the timeslice
-information for the TSG (SCALE/TIMEOUT), as well as the TSG id specified in the
-TSGID field.
- Because the length here is M, the Runlist *must* contain M additional
-runlist entries of type ENTRY_TYPE_CHAN that will be part of this TSG.
-Similarly, the next (N+1) number of entries, a TSG header entry followed by N
-number of regular channel entry, correspond to the second TSG.
-
-#define NV_RAMRL_ENTRY /* ----G */
-#define NV_RAMRL_ENTRY_RANGE 0xF:0x00000000 /* RW--M */
-#define NV_RAMRL_ENTRY_SIZE 16 /* */
-// Runlist base must be 4k-aligned.
-#define NV_RAMRL_ENTRY_BASE_SHIFT 12 /* */
-
-
-#define NV_RAMRL_ENTRY_TYPE (0+0*32):(0+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TYPE_CHAN 0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_TYPE_TSG 0x00000001 /* RW--V */
-
-#define NV_RAMRL_ENTRY_ID (11+2*32):(0+2*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_ID_HW 11:0 /* RWXUF */
-#define NV_RAMRL_ENTRY_ID_MAX (4096-1) /* RW--V */
-
-
-
-
-
-#define NV_RAMRL_ENTRY_CHAN_RUNQUEUE_SELECTOR (1+0*32):(1+0*32) /* RWXUF */
-
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET (5+0*32):(4+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_VID_MEM 0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
-
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET (7+0*32):(6+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM 0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM_NVLINK_COHERENT 0x00000001 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */
-#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */
-
-#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_LO (31+0*32):(8+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_HI (31+1*32):(0+1*32) /* RWXUF */
-
-#define NV_RAMRL_ENTRY_CHAN_CHID (11+2*32):(0+2*32) /* RWXUF */
-
-#define NV_RAMRL_ENTRY_CHAN_INST_PTR_LO (31+2*32):(12+2*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_CHAN_INST_PTR_HI (31+3*32):(0+3*32) /* RWXUF */
-
-
-
-// Macros for shifting out low bits of INST_PTR and USERD_PTR.
-#define NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT 12 /* ----C */
-#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT 8 /* ----C */
-
-
-
-
-
-
-
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE (19+0*32):(16+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE_3 0x00000003 /* RWI-V */
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT (31+0*32):(24+0*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_128 0x00000080 /* RWI-V */
-
-
-#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_1US 0x00000000 /* */
-
-#define NV_RAMRL_ENTRY_TSG_LENGTH (7+1*32):(0+1*32) /* RWXUF */
-#define NV_RAMRL_ENTRY_TSG_LENGTH_INIT 0x00000000 /* RW--V */
-#define NV_RAMRL_ENTRY_TSG_LENGTH_MIN 0x00000001 /* RW--V */
-#define NV_RAMRL_ENTRY_TSG_LENGTH_MAX 0x00000080 /* RW--V */
-
-#define NV_RAMRL_ENTRY_TSG_TSGID (11+2*32):(0+2*32) /* RWXUF */
-
-
-
-6 - Host Pushbuffer Format (FIFO_DMA)
-=======================================
-
- "FIFO" refers to Host. "FIFO_DMA" means data that Host reads from memory:
-the pushbuffer. Host autonomously reads pushbuffer data from memory and
-generates method address/data pairs from the data.
-
- Pushbuffer terminology:
-
- - A channel is the logical sequence of instructions associated with a GPU
- context.
-
- - The pushbuffer is a stream of data in memory containing the
- specifications of the operations that a channel is to perform for a
- particular client. Pushbuffer data consists of pushbuffer entries.
-
- - A pushbuffer entry (PB entry) is a 32-bit (doubleword) sized unit of
- pushbuffer data. This is the smallest granularity at which Host consumes
- pushbuffer data. A PB entry is either a PB instruction (which is either
- a PB control entry or a PB method header), or a method data entry.
-
- - A pushbuffer segment (PB segment) is a contiguous block of memory
- containing pushbuffer entries. The location and size of a pushbuffer
- segment is defined by its respective GP entry in the GPFIFO.
-
- - A pushbuffer control entry (PB control entry) is a single PB entry of
- type SET_SUBDEVICE_MASK, STORE_SUBDEVICE_MASK, USE_SUBDEVICE_MASK,
- END_PB_SEGMENT, or a universal NOP (NV_FIFO_DMA_NOP).
-
- - A pushbuffer compressed method sequence is a sequence of pushbuffer
- entries starting with a method header and a variable-length sequence of
- method data entries (the length being defined by the method header). A
- single PB compressed method sequence expands into one or more methods.
- This may also be known as a "pushbuffer method" (PB method), but that
- terminology is ambiguous and not preferred.
-
- - A pushbuffer method header (PB method header) is the first PB entry found
- in a PB compressed method sequence. A PB method header is a PB
- instruction performed on method data entries.
-
- - A pushbuffer instruction (PB instruction) is a PB entry that is not a PB
- method data entry. A PB instruction is either a PB control entry or a PB
- method header.
-
- - A method is an address/data pair representing an operation to perform.
-
- - A method data entry is the 32-bit operand for its corresponding method.
-
-
-
-#define NV_FIFO_PB_ENTRY_SIZE 4 /* */
-
-
- Some engines such as Graphics internally support a double-wide method FIFO;
-these are known as "data-hi" methods. It is Host that performs the packing of
-two methods into one double-wide entry. Host will only generate data-hi methods
-if the following conditions are satisfied:
-
- 1. The two methods come from the same PB method (in other words they share
- the same method header).
-
- 2. The method header specifies a non-incrementing method, an incrementing
- method, or an increment-once method.
-
- 3. The paired methods either have the same method address, or the first
- method has an even NV_FIFO_DMA_METHOD_ADDRESS field and the second
- (data-hi) method is the increment of the first. (That is, the
- left-shifted method address as listed in the class files must be
- divisible by 8 for this condition to hold.)
-
- 4. The second method is available at the time of pushing the first one into
- the engine's method FIFO. In other words, Host will not wait to pack
- methods. Note that if the engine's method fifo is full, the
- back-pressure will in itself create a "wait time".
-
-The first three conditions are under SW's control. Only the graphics engine
-supports data-hi methods.
-
-
-Types of PB Entries
-
- PB entries can be classified into three types: PB method headers, PB
-control entries, and PB method data. Different types of PB entries have
-different formats. Because PB compressed method sequences are of variable
-length, it is impossible to determine the type of a PB entry without tracking
-the pushbuffer from the beginning or from the location of a PB entry that is
-known to not be a PB method data entry.
-
- A PB method data entry is always found in a method data sequence
-immediately following a PB method header in the logical stream of PB entries.
-The PB method header contains a NV_FIFO_DMA_METHOD_COUNT field, the value of
-which is equal to the length of the method data sequence. Note a PB method
-header does not necessarily come with PB method data entries (see details below
-about immediate-data method headers and method headers for which COUNT is zero).
-Also note the PB method data entries may be located in a PB segment separate
-from their corresponding method header. The format of any given PB method data
-entry is defined in the "NV_UDMA" section of dev_pbdma.ref.
-
- A PB entry that is either a PB method header or PB control entry is known
-as a PB instruction. The type of a PB instruction is specified by the
-NV_FIFO_DMA_SEC_OP field and the NV_FIFO_DMA_TERT_OP field.
-
- secondary tertiary
- opcode opcode entry type
- --------- -------- --------------------------------
- 000 01 SET_SUBDEVICE_MASK
- 000 10 STORE_SUBDEVICE_MASK
- 000 11 USE_SUBDEVICE_MASK
- 001 xx incrementing method header
- 011 xx non-incrementing method header
- 100 xx immediate-data method header
- 101 xx increment-once method header
- 111 xx END_PB_SEGMENT
- --------- -------- --------------------------------
-
- Types of methods:
-
- - A Host method is a method whose address is defined in the NV_UDMA device
- range.
-
- - A Host-only method is any Host method excluding SetObject (also known as
- NV_UDMA_OBJECT).
-
- - An engine method is a method whose address is not defined within the
- NV_UDMA device range. There are multiple engines designated by a
- subchannel ID. Software methods are included in this category.
-
- - A software method (SW method) is a method which causes an interrupt for
- the express purpose of being handled by software. For details see the
- section on software methods below.
-
- For more information about types of methods see "HOST METHODS" and
-"RESERVED METHOD ADDRESSES" in dev_pbdma.ref.
-
- The method address in a PB method header (stored in the
-NV_FIFO_DMA_METHOD_ADDRESS field) is a dword-address, not a byte-address. In
-other words the least significant two bits of the address are not stored because
-the byte-address is dword-aligned (thus the least significant two bits are
-always zero).
-
- The subchannel in a PB method header (stored in the
-NV_FIFO_DMA_*_SUBCHANNEL field) determines the engine to which a method will be
-sent if the method is SetObject or an engine method (otherwise, the SUBCHANNEL
-field is ignored). SetObject enables SW to request HW to check the expectation
-that a given subchannel serves the specified class ID; see the description of
-"NV_UDMA_OBJECT" in dev_pbdma.ref.
-
- The mapping between subchannels and engines is fixed. A subchannel is
-bound to a given class according to the runlist. Each engine method is applied
-to an "object," which itself is an instance of an NV class as defined by the
-master MFS class files. Each object belongs to an engine. For SetObject and
-engine methods, the engine is determined entirely by the SUBCHANNEL field of
-the method's header via a fixed mapping that depends on the runlist on which the
-method arrives.
-
- Methods on subchannels 0-4 are handled by the primary engine served by the
-runlist, except that subchannel 4 targets GRCOPY0 and GRCOPY1 on the graphics
-runlist. For Graphics/Compute, SetObject associates subchannels 0, 1, 2, and 3
-with class identifiers for 3D, compute, I2M, and 2D respectively. On other
-runlists, the subchannel is ignored, and Host does not send the subchannel ID to
-the engine. It is recommended that SW only use subchannel 4 on the dedicated
-copy engines for consistency with GRCOPY usage.
-
- Subchannels 5-7 are for software methods. Any methods on these subchannels
-(including SetObject methods) are kicked back to software for handling via the
-SW method dispatch mechanism using the NV_PPBDMA_INTR_*_DEVICE interrupt. SW
-may choose to send a SetObject method to each engine subchannel before sending
-any methods on that particular subchannel in order to support multiple software
-classes.
-
- If a method stream subchannel-switches from targeting graphics/compute to a
-copy engine or vice-versa, that is, to or from subchannel 4 on GR, Host will:
-
- 1. Wait until the first engine has completed all its methods,
-
- 2. Wait until that engine indicates that it is idle (WFI), and
-
- 3. Send a sysmem barrier flush and wait until it completes.
-
-Only then will Host send methods to the newly targeted engine.
-
- Note that this WFI will not occur for sending Host-only methods on the new
-subchannel, since Host-only methods ignore the subchannel field. Additionally,
-when switching from CE to graphics/compute, Host forces FE to perform a cache
-invalidate. Other subchannel switch semantics may be provided by the engines
-themselves, such as switching between subchannels 0-3 within FE.
-
-
-#define NV_FIFO_DMA /* ----G */
-#define NV_FIFO_DMA_METHOD_ADDRESS_OLD 12:2 /* RWXUF */
-#define NV_FIFO_DMA_METHOD_ADDRESS 11:0 /* RWXUF */
-
-#define NV_FIFO_DMA_SUBDEVICE_MASK 15:4 /* RWXUF */
-
-#define NV_FIFO_DMA_METHOD_SUBCHANNEL 15:13 /* RWXUF */
-
-#define NV_FIFO_DMA_TERT_OP 17:16 /* RWXUF */
-#define NV_FIFO_DMA_TERT_OP_GRP0_SET_SUB_DEV_MASK 0x00000001 /* RW--V */
-#define NV_FIFO_DMA_TERT_OP_GRP0_STORE_SUB_DEV_MASK 0x00000002 /* RW--V */
-#define NV_FIFO_DMA_TERT_OP_GRP0_USE_SUB_DEV_MASK 0x00000003 /* RW--V */
-
-#define NV_FIFO_DMA_METHOD_COUNT_OLD 28:18 /* RWXUF */
-#define NV_FIFO_DMA_METHOD_COUNT 28:16 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_DATA 28:16 /* RWXUF */
-
-#define NV_FIFO_DMA_SEC_OP 31:29 /* RWXUF */
-#define NV_FIFO_DMA_SEC_OP_GRP0_USE_TERT 0x00000000 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_INC_METHOD 0x00000001 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_NON_INC_METHOD 0x00000003 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_IMMD_DATA_METHOD 0x00000004 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_ONE_INC 0x00000005 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_RESERVED6 0x00000006 /* RW--V */
-#define NV_FIFO_DMA_SEC_OP_END_PB_SEGMENT 0x00000007 /* RW--V */
-
-
-Incrementing PB Method Header Format
-
- An incrementing PB method header specifies that Host generate a sequence of
-methods. The length of the sequence is defined by the method header. The
-method data for each method in this sequence is found in a sequence of PB
-entries immediately following the method header.
-
- The dword-address of the first method is specified by the method header,
-and the dword-address of each subsequent method is equal to the dword-address of
-the previous method plus one. Or in other words, the byte-address of each
-subsequent method is equal to the byte-address of the previous method plus four.
-
-Example sequence of methods generated from an incrementing method header:
-
- addr data0
- addr+1 data1
- addr+2 data2
- addr+3 data3
- ... ...
-
- The NV_FIFO_DMA_INCR_COUNT field contains the number of methods in the
-generated sequence. This is the same as the number of method data entries that
-follow the method header. If the COUNT field is zero, the other fields are
-ignored, and the PB method effectively becomes a no-op with no method data
-entries following it.
-
- The NV_FIFO_DMA_INCR_SUBCHANNEL field contains the subchannel to use for
-the methods generated from the method header. See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
- The NV_FIFO_DMA_INCR_ADDRESS field contains the method address for the
-first method in the generated sequence. The dword-address of the method is
-incremented by one each time a method is generated. A method address specifies
-an operation to be performed. Note that because the ADDRESS is a dword-address
-and not a byte-address, the least two significant bits of the method's
-byte-address are not stored.
-
- The NV_FIFO_DMA_INCR_DATA fields contain the method data for the methods in
-the generated sequence. The number of method data entries is defined by the
-COUNT field. A method data entry contains an operand for its respective method.
-
- Bit 12 is reserved for the future expansion of either the subchannel or the
-address fields.
-
-
-#define NV_FIFO_DMA_INCR /* ----G */
-#define NV_FIFO_DMA_INCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */
-#define NV_FIFO_DMA_INCR_OPCODE_VALUE 0x00000001 /* ----V */
-#define NV_FIFO_DMA_INCR_COUNT (0*32+28):(0*32+16) /* RWXUF */
-#define NV_FIFO_DMA_INCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */
-#define NV_FIFO_DMA_INCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */
-#define NV_FIFO_DMA_INCR_DATA (1*32+31):(1*32+0) /* RWXUF */
-
-
-Non-Incrementing PB Method Header Format
-
- A non-incrementing PB method header specifies that Host generate a sequence
-of methods. The length of the sequence is defined by the method header. The
-method data for each method in this sequence is contained within the PB entries
-immediately following the method header.
-
- Unlike with the incrementing PB method header, the sequence of methods
-generated all have the same method address. The dword-address of every method
-in this sequence is specified by the method header. Although the methods all
-have the same address, the method data entries may be different.
-
-Example sequence of methods generated from a non-incrementing method header:
-
- addr data0
- addr data1
- addr data2
- addr data3
- ... ...
-
- The NV_FIFO_DMA_NONINCR_COUNT field contains the number of methods
-in the generated sequence. This is the same as the number of method data
-entries that follow the method header. If the COUNT field is zero, the other
-fields are ignored, and the PB method effectively becomes a no-op with no method
-data entries following it.
-
- The NV_FIFO_DMA_NONINCR_SUBCHANNEL field contains the subchannel to use for
-the methods generated from the method header. See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
- The NV_FIFO_DMA_NONINCR_ADDRESS field contains the method address for every
-method in the generated sequence. A method address specifies an operation to be
-performed. Note that because the ADDRESS field is a dword-address and not a
-byte-address, the least two significant bits of the method's byte-address are
-not stored.
-
- The NV_FIFO_DMA_NONINCR_DATA fields contain the method data for the methods
-in the generated sequence. The number of method data entries is defined by the
-COUNT field. A method data entry contains an operand for its respective method.
-
- Bit 12 is reserved for the future expansion of either the subchannel or the
-address fields.
-
-
-#define NV_FIFO_DMA_NONINCR /* ----G */
-#define NV_FIFO_DMA_NONINCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_OPCODE_VALUE 0x00000003 /* ----V */
-#define NV_FIFO_DMA_NONINCR_COUNT (0*32+28):(0*32+16) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */
-#define NV_FIFO_DMA_NONINCR_DATA (1*32+31):(1*32+0) /* RWXUF */
-
-
-Increment-Once PB Method Header Format
-
- An increment-once PB method header specifies that Host generate a sequence
-of methods. The length of the sequence is defined by the method header. The
-method data for each method in this sequence is found in a sequence of PB
-entries immediately following the method header.
-
- The dword-address of the first method is specified by the method header.
-The address of the second and all following methods is equal to the
-dword-address of the first method plus one. In other words, the byte-address of
-the second and all following methods is equal to the byte-address of the first
-method plus four.
-
-Example sequence of methods generated from an increment-once method header:
-
- addr data0
- addr+1 data1
- addr+1 data2
- addr+1 data3
- ... ...
-
- The NV_FIFO_DMA_ONEINCR_COUNT field contains the number of methods in the
-generated sequence. This is the same as the number of method data entries that
-follow the method header. If the COUNT field is zero, the other fields are
-ignored, and the PB method effectively becomes a no-op method with no method
-data entries following it.
-
- The NV_FIFO_DMA_ONEINCR_SUBCHANNEL field contains the subchannel to use for
-the methods generated from the method header. See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
- The NV_FIFO_DMA_ONEINCR_ADDRESS field contains the method address for the
-first method in the generated sequence. A method address specifies an operation
-to be performed. Note that because the ADDRESS is a dword-address and not a
-byte-address, the least two significant bits of the method's byte-address are
-not stored.
-
- The NV_FIFO_DMA_ONEINCR_DATA fields contain the method data for the methods
-in the generated sequence. The number of method data entries is defined by the
-COUNT field. A method data entry contains an operand for its respective method.
-
- Bit 12 is reserved for the future expansion of either the subchannel or the
-address fields.
-
-
-#define NV_FIFO_DMA_ONEINCR /* ----G */
-#define NV_FIFO_DMA_ONEINCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_OPCODE_VALUE 0x00000005 /* ----V */
-#define NV_FIFO_DMA_ONEINCR_COUNT (0*32+28):(0*32+16) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */
-#define NV_FIFO_DMA_ONEINCR_DATA (1*32+31):(1*32+0) /* RWXUF */
-
-
-No-Operation PB Instruction Formats
-
- The method header for a no-op PB method may be specified in multiple ways,
-but the preferred way is to set the PB instruction to NV_FIFO_DMA_NOP.
-In any case NV_FIFO_DMA_NOP is a universal NOP entry that bypasses any method
-header format check, and is not considered a method header.
-
-
-#define NV_FIFO_DMA_NOP 0x00000000 /* ----C */
-
-
-Immediate-Data PB Method Header Format
-
- If a method's operand fits within 13 bits, a PB method may be specified in
-a single PB entry, using the immediate-data PB method header format. Exactly
-one method is generated from this method header.
-
- The NV_FIFO_DMA_IMMD_SUBCHANNEL field contains the subchannel to use for
-the method generated from the method header. See the documentation above for
-NV_FIFO_DMA_*_SUBCHANNEL.
-
- The NV_FIFO_DMA_IMMD_ADDRESS field contains the method address for the
-single generated method. A method address specifies an operation to be
-performed. Note that because the ADDRESS is a dword-address and not a
-byte-address, the least two significant bits of the method's byte-address are
-not stored.
-
- The single NV_FIFO_DMA_IMMD_DATA field contains the method data for the
-generated method. This method data contains an operand for the generated
-method.
-
-
-#define NV_FIFO_DMA_IMMD /* ----G */
-#define NV_FIFO_DMA_IMMD_ADDRESS 11:0 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_SUBCHANNEL 15:13 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_DATA 28:16 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_OPCODE 31:29 /* RWXUF */
-#define NV_FIFO_DMA_IMMD_OPCODE_VALUE 0x00000004 /* ----V */
-
-
-Set Sub-Device Mask PB Control Entry Format
-
- The SET_SUBDEVICE_MASK (SSDM) PB control entry is used when multiple GPU
-contexts are using the same pushbuffer (for example, for SLI or for stereo
-rendering) and there is data in the push buffer that is for only a subset of the
-GPU contexts. This instruction allows the pushbuffer to tell a specific GPU
-context to use or ignore methods following the SET_SUBDEVICE_MASK. While the
-logical-AND of NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE and the GPU context's
-NV_PPBDMA_SUBDEVICE_ID value is zero, methods are ignored. Pushbuffer control
-entries (like SET_SUBDEVICE_MASK) are not ignored.
-
-********************************************************************************
-Warning: When using subdevice masking, one must take care to synchronize
-properly with any later GP entries marked FETCH_CONDITIONAL. If GP fetching
-gets too far ahead of PB processing, it is possible for a later conditional PB
-segment to be discarded prior to reaching an SSDM command that sets
-SUBDEVICE_STATUS to ACTIVE. This would cause Host to execute garbage data. One
-way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL
-segments following a subdevice reenable.
-********************************************************************************
-
-
-
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK /* ----G */
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE 15:4 /* RWXUF */
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */
-#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE_VALUE 0x00000001 /* ----V */
-
-
-Store Sub-Device Mask PB Control Entry Format
-
- The STORE_SUBDEVICE_MASK PB control entry is used to save a subdevice mask
-value to be used later by a USE_SUBDEVICE_MASK PB instruction.
-
-
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK /* ----G */
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_VALUE 15:4 /* RWXUF */
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */
-#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE_VALUE 0x00000002 /* ----V */
-
-
-Use Sub-Device Mask PB Control Entry Format
-
- The USE_SUBDEVICE_MASK PB control entry is used to apply the subdevice mask
-value saved by a STORE_SUBDEVICE_MASK PB instruction. The effect of the mask is
-the same as for a SET_SUBDEVICE_MASK PB instruction.
-
-
-#define NV_FIFO_DMA_USE_SUBDEVICE_MASK /* ----G */
-#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */
-#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE_VALUE 0x00000003 /* ----V */
-
-
-End-PB-Segment PB Control Entry Format
-
- Engines may write PB segments themselves, but they cannot write GP entries.
-Because they cannot write GP entries, they cannot alter the size of a PB
-segment. If an engine is writing a PB segment, and if it does not need to fill
-the entire PB segment it was allocated, instead of filling the remainder of the
-PB segment with no-op PB instructions, it may write a single End-PB-Segment
-control entry to indicate that the pushbuffer data contains no further valid
-data. No further PB entries from that PB segment will be decoded or processed.
-Host may have already issued requests to fetch the remainder of the PB segment
-before an End-PB-Segment PB instruction is processed. Host may or may not fetch
-the remainder of the PB segment. Also note that doing a PB CRC check on this
-segment via NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC will be indeterminate.
-
-
-#define NV_FIFO_DMA_ENDSEG_OPCODE 31:29 /* RWXUF */
-#define NV_FIFO_DMA_ENDSEG_OPCODE_VALUE 0x00000007 /* ----V */
-
-