From f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Wed, 12 Jun 2019 14:41:51 -0700 Subject: New ref manuals directory, delete old locations As decided in a recent OpenSource-Approval meeting, we want the directory structure for reference manuals here to be fairly close to the way they are organized internal to NVIDIA. This CL therefore does the following: Rename from: Host-Fifo/volta/gv100/* Display-Ref-Manuals/gv100/* to: manuals/volta/gv100/* Regenerate index.html files to match (important for the "github pages" site, at https://nvidia.github.io/open-gpu-doc/ . Reviewed by: Maneet Singh --- Host-Fifo/volta/gv100/dev_ram.ref.txt | 1269 --------------------------------- 1 file changed, 1269 deletions(-) delete mode 100644 Host-Fifo/volta/gv100/dev_ram.ref.txt (limited to 'Host-Fifo/volta/gv100/dev_ram.ref.txt') diff --git a/Host-Fifo/volta/gv100/dev_ram.ref.txt b/Host-Fifo/volta/gv100/dev_ram.ref.txt deleted file mode 100644 index e80d9c0..0000000 --- a/Host-Fifo/volta/gv100/dev_ram.ref.txt +++ /dev/null @@ -1,1269 +0,0 @@ -Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. - -Permission is hereby granted, free of charge, to any person obtaining a -copy of this software and associated documentation files (the "Software"), -to deal in the Software without restriction, including without limitation -the rights to use, copy, modify, merge, publish, distribute, sublicense, -and/or sell copies of the Software, and to permit persons to whom the -Software is furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL -THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER -DEALINGS IN THE SOFTWARE. --------------------------------------------------------------------------------- - -2 - GPU INSTANCE RAM (RAMIN) -============================== - - A GPU contains a block called "XVE" that manages the interface with PCI, a -block called "Host" that fetches graphics instructions, blocks called "engines" -that execute graphics instructions, and blocks that manage the interface with -memory. - - .-----. .------. - | |<------------------>| | - | | | | - | | .---------. | | - | |<--->| Engine1 |<---| | - | | `---------' | | -.---------. | | | | -| GPU | | | .---------. | Host | -| Local |<-->| FB |<--->| Engine2 |<---| | -| Memory | | MMU | `---------' | | -`---------' | Hub | ... | | .--------. - | | .---------. | | | System | - | |<--->| EngineN |<---| | | Memory | - | | `---------' `------' `--------' - | | ^ ^ - | | | | -.---------. | | .--V--. PCI .--V--. .-----. -| Display |<-->| |<------------------>| XVE |<--->| NB |<--->| CPU | -`---------' `-----' `-----' `-----' `-----' - - A GPU context is a virtualization of the GPU for a particular software -application. A GPU instance block is a block of memory that contains the state -for a GPU context. A GPU context's instance block consists of Host state, -pointers to each engine's state, and memory management state. A GPU instance -block also contains a pointer to a block of memory that contains that part of a -GPU context's state that a user-level driver may access. A GPU instance block -fits within a single 4K-byte page of memory. - - Run List Channel-Map RAM - .----------. Ch Id .----------------. - | RL Entry0 |----. |Ch0 Inst Blk Ptr| - | RL Entry1 | | |Ch1 Inst Blk Ptr| - | RL Entry2 | | | ... | - | ... | `--->|ChI Inst Blk Ptr|----. - | RL EntryN | | ... | | - `-----------' |ChN Inst Blk Ptr| | - `----------------' | - | - .-----------------------------------------------' - | - | GPU Instance Block GPFIFO - `-->.-----------------. GP_GET .--------. PB Seg - | |------------------------------>|GP Entry| .--------. - | Host State | |GP Entry|--->|PB Entry| - | (RAMFC) | User-Driver State | | |PB Entry| - | | .-------. |GP Entry| | ... | - | |------------->|(USERD)| GP_PUT |GP Entry| |PB Entry| - | | | |------->`--------' `--------' - | | | | - +-----------------+ | | - | Memory | `-------' - | Management |----------. Page Directory Page Table - | State | | .-------. .-------. - +-----------------+ `-->| PDE | | PTE | - | Pointer to | | PDE |------->| PTE | - | Engine0 |--------. | ... | | ... | - | State | | | PDE | | PTE | - +-----------------+ | `-------' `-------' - | Pointer to | | - | Engine1 |-----. | Engine0 State - | State | | | .-------. - +-----------------+ | `---->| | - ... | `-------' - +-----------------+ | - | Pointer to | | Engine1 State - | EngineN |--. | .-------. - | State | | `------->| | - `-----------------' | `-------' - | ... - | - | EngineN State - | .-------. - `---------->| | - `-------' - - The GPU context's Host state occupies the first 128 double words of an -instance block. A GPU context's Host state is called "RAMFC". Please see -the NV_RAMFC section below for a description of Host state. - - The GPU context's memory-management state defines the virtual address space -that the GPU context uses. Memory management state consists of page and -directory tables (that specify the mapping between virtual addresses and -physical addresses, and the attributes of memory pages), and the limit of the -virtual address space. The NV_RAMIN_PAGE_DIR_BASE entry contains the address of -base of the GPU context's page directory table (PDB). NV_RAMIN_PAGE_DIR_BASE is -4K-byte aligned. - - The NV_RAMIN_ENG*_WFI_PTR entry contains the address of a block of memory -for storing an engine's context state. Blocks of memory that contain engine state -are 4K-byte aligned. Only one engine context is supported per instance block. - - The NV_RAMIN_ENG*_CS field is deprecated, it was used to indicate whether -GPU state should be restored from the FGCS pointer or from the WFI CS pointer. -Engines only need/support one CTXSW pointer and all state is stored there -whether a WFI CS or other form of preemption was performed. This field must -always be set to WFI for legacy reasons, and will eventually be deleted. - - -#define NV_RAMIN /* ----G */ - -// The instance block must be 4k-aligned. -#define NV_RAMIN_BASE_SHIFT 12 /* */ - -// The instance block size fits within a single 4k block. -#define NV_RAMIN_ALLOC_SIZE 4096 /* */ - -// Host State -#define NV_RAMIN_RAMFC (127*32+31):(0*32+0) /* RWXUF */ - -// Memory-Management State - - The following fields are used for non-VEID engines. The NV_RAMIN_SC_* described later - are used for VEID engines. - - NV_RAMIN_PAGE_DIR_BASE_TARGET determines if the top level of the page tables - is in video memory or system memory (peer is not allowed), and the CPU cache - coherency for system memory. - Using INVALID, unbinds the selected engine. - -#define NV_RAMIN_PAGE_DIR_BASE_TARGET (128*32+1):(128*32+0) /* RWXUF */ -#define NV_RAMIN_PAGE_DIR_BASE_TARGET_VID_MEM 0x00000000 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_TARGET_INVALID 0x00000001 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */ - - NV_RAMIN_PAGE_DIR_BASE_VOL identifies the volatile behavior - of top level of the page table (whether local L2 can cache it or not). - -#define NV_RAMIN_PAGE_DIR_BASE_VOL (128*32+2):(128*32+2) /* RWXUF */ -#define NV_RAMIN_PAGE_DIR_BASE_VOL_TRUE 0x00000001 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_VOL_FALSE 0x00000000 /* RW--V */ - - - These bits specify whether the MMU will treats faults as replayable or not. - The engine will send these bits to the MMU as part of the instance bind. - -#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX (128*32+4):(128*32+4) /* RWXUF */ -#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED 0x00000000 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED 0x00000001 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC (128*32+5):(128*32+5) /* RWXUF */ -#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED 0x00000000 /* RW--V */ -#define NV_RAMIN_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED 0x00000001 /* RW--V */ - - NV_RAMIN_USE_NEW_PT_FORMAT determines which page table format to use. - When NV_RAMIN_USE_NEW_PT_FORMAT is false, the page table uses the old format. - When NV_RAMIN_USE_NEW_PT_FORMAT is true, the page table uses the new format. - - Volta only supports the new format. Selecting the old format results in an UNBOUND_INSTANCE fault. - - -#define NV_RAMIN_USE_VER2_PT_FORMAT (128*32+10):(128*32+10) /* */ -#define NV_RAMIN_USE_VER2_PT_FORMAT_FALSE 0x00000000 /* */ -#define NV_RAMIN_USE_VER2_PT_FORMAT_TRUE 0x00000001 /* */ - - When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit TRUE, the bit selects the big page size. - When NV_PFB_PRI_MMU_CTRL_USE_PDB_BIG_PAGE_SIZE is bit FALSE, NV_PFB_PRI_MMU_CTRL_VM_PG_SIZE selects the big page size. - - Volta only supports 64KB for big pages. Selecting 128KB for big pages results in an UNBOUND_INSTANCE fault. - -#define NV_RAMIN_BIG_PAGE_SIZE (128*32+11):(128*32+11) /* RWXUF */ -#define NV_RAMIN_BIG_PAGE_SIZE_128KB 0x00000000 /* RW--V */ -#define NV_RAMIN_BIG_PAGE_SIZE_64KB 0x00000001 /* RW--V */ - - NV_RAMIN_PAGE_DIR_BASE_LO and NV_RAMIN_PAGE_DIR_BASE_HI - identify the page directory base (start of the page table) - location for this context. - -#define NV_RAMIN_PAGE_DIR_BASE_LO (128*32+31):(128*32+12) /* RWXUF */ -#define NV_RAMIN_PAGE_DIR_BASE_HI (129*32+31):(129*32+0) /* RWXUF */ - -// Single engine pointer channels cannot support multiple -// engines with CTXSW pointers -#define NV_RAMIN_ENGINE_CS (132*32+3):(132*32+3) /* */ -#define NV_RAMIN_ENGINE_CS_WFI 0x00000000 /* */ -#define NV_RAMIN_ENGINE_CS_FG 0x00000001 /* */ -#define NV_RAMIN_ENGINE_WFI_TARGET (132*32+1):(132*32+0) /* */ -#define NV_RAMIN_ENGINE_WFI_TARGET_LOCAL_MEM 0x00000000 /* */ -#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_COHERENT 0x00000002 /* */ -#define NV_RAMIN_ENGINE_WFI_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* */ -#define NV_RAMIN_ENGINE_WFI_MODE (132*32+2):(132*32+2) /* */ -#define NV_RAMIN_ENGINE_WFI_MODE_PHYSICAL 0x00000000 /* */ -#define NV_RAMIN_ENGINE_WFI_MODE_VIRTUAL 0x00000001 /* */ -#define NV_RAMIN_ENGINE_WFI_PTR_LO (132*32+31):(132*32+12) /* */ -#define NV_RAMIN_ENGINE_WFI_PTR_HI (133*32+7):(133*32+0) /* */ - -#define NV_RAMIN_ENGINE_WFI_VEID (134*32+(6-1)):(134*32+0) /* */ -#define NV_RAMIN_ENABLE_ATS (135*32+31):(135*32+31) /* RWXUF */ -#define NV_RAMIN_ENABLE_ATS_TRUE 0x00000001 /* RW--V */ -#define NV_RAMIN_ENABLE_ATS_FALSE 0x00000000 /* RW--V */ -#define NV_RAMIN_PASID (135*32+(20-1)):(135*32+0) /* RWXUF */ - - - Pointer to a method buffer in BAR2 memory where a faulted engine can save -out methods. BAR2 accesses are assumed to be virtual, so the address saved here -is a virtual address. - -#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_LO (136*32+31):(136*32+0) /* RWXUF */ -#define NV_RAMIN_ENG_METHOD_BUFFER_ADDR_HI (137*32+(((49-1)-32))):(137*32+0) /* RWXUF */ - - - - These entries are used to inform FECS which of the below array of PDBs are - valid/filled in and need to subsequently be bound. - - This needs to reserve at least NV_LITTER_NUM_SUBCTX entries. Currently - there is enough space reserved for 64 subcontexts. -#define NV_RAMIN_SC_PDB_VALID(i) (166*32+i):(166*32+i) /* RWXUF */ -#define NV_RAMIN_SC_PDB_VALID__SIZE_1 64 /* */ -#define NV_RAMIN_SC_PDB_VALID_FALSE 0x00000000 /* RW--V */ -#define NV_RAMIN_SC_PDB_VALID_TRUE 0x00000001 /* RW--V */ - -// Memory-Management VEID array - - The NV_RAMIN_SC_PAGE_DIR_BASE_* entries are an array of page table settings - for each subcontext. When a context supports subcontexts, the page table - information for a given VEID/Subcontext needs to be filled in or else page - faults will result on access. - - These properties for the page table must be filled in for all channels - sharing the same context as any channel's NV_RAMIN may be used to load the - context. - - The non-subcontext page table information such as NV_RAMIN_PAGE_DIR_BASE* - are used by non-subcontext engines and clients such as Host, CE, or the - video engines. - - NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) determines if the top level of the page tables - is in video memory or system memory (peer is not allowed), and the CPU cache - coherency for system memory. - Using INVALID, unbinds the selected subcontext. - -#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET(i) ((168+(i)*4)*32+1):((168+(i)*4)*32+0) /* RWXUF */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET__SIZE_1 64 /* */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_VID_MEM 0x00000000 /* RW--V */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_INVALID 0x00000001 /* RW--V */ // Note: INVALID should match PEER -#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */ - - NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) identifies the volatile behavior - of the top level of the page table (whether local L2 can cache it or not). - -#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL(i) ((168+(i)*4)*32+2):((168+(i)*4)*32+2) /* RWXUF */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL__SIZE_1 64 /* */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_TRUE 0x00000001 /* RW--V */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_VOL_FALSE 0x00000000 /* RW--V */ - - NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) and - NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) bits specify whether - the MMU will treats faults from TEX and GCC as replayable or - not. Based on that fault packets are written into replayable fault - buffer (or not) and faulting requests are put into replay request - buffer (or not). - The last bind that does not unbind a sub-context determines the REPLAY_TEX and REPLAY_GCC for all sub-contexts. - -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX(i) ((168+(i)*4)*32+4):((168+(i)*4)*32+4) /* RWXUF */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX__SIZE_1 64 /* */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_DISABLED 0x00000000 /* RW--V */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_TEX_ENABLED 0x00000001 /* RW--V */ - -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC(i) ((168+(i)*4)*32+5):((168+(i)*4)*32+5) /* RWXUF */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC__SIZE_1 64 /* */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_DISABLED 0x00000000 /* RW--V */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_FAULT_REPLAY_GCC_ENABLED 0x00000001 /* RW--V */ - - NV_RAMIN_SC_USE_VER2_PT_FORMAT determines which page table format to use. - When NV_RAMIN_SC_USE_VER2_PT_FORMAT is false, the page table uses - the old format(2-level page table). When - NV_RAMIN_SC_USE_VER2_PT_FORMAT is true, the page table uses the - new format (5-level 49-bit VA format). - The last bind that does not unbind a sub-context determines the page table format for all sub-contexts. - Volta only supports the new format. Selecting the old format results in an UNBOUND_INSTANCE fault. - -#define NV_RAMIN_SC_USE_VER2_PT_FORMAT(i) ((168+(i)*4)*32+10):((168+(i)*4)*32+10) /* RWXUF */ -#define NV_RAMIN_SC_USE_VER2_PT_FORMAT__SIZE_1 64 /* */ -#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_FALSE 0x00000000 /* RW--V */ -#define NV_RAMIN_SC_USE_VER2_PT_FORMAT_TRUE 0x00000001 /* RW--V */ - - The last bind that does not unbind a sub-context determines the big page size for all sub-contexts. - Volta only supports 64KB for big pages. - -#define NV_RAMIN_SC_BIG_PAGE_SIZE(i) ((168+(i)*4)*32+11):((168+(i)*4)*32+11) /* RWXUF */ -#define NV_RAMIN_SC_BIG_PAGE_SIZE__SIZE_1 64 /* */ -#define NV_RAMIN_SC_BIG_PAGE_SIZE_64KB 0x00000001 /* RW--V */ - - NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) and NV_RAMIN_SC_PAGE_DIR_BASE_HI(i) - identify the page directory base (start of the page table) - location for subcontext i. - -#define NV_RAMIN_SC_PAGE_DIR_BASE_LO(i) ((168+(i)*4)*32+31):((168+(i)*4)*32+12) /* RWXUF */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_LO__SIZE_1 64 /* */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_HI(i) ((169+(i)*4)*32+31):((169+(i)*4)*32+0) /* RWXUF */ -#define NV_RAMIN_SC_PAGE_DIR_BASE_HI__SIZE_1 64 /* */ - - - - - - NV_RAMIN_SC_ENABLE_ATS(i) tells whether subcontext i is ATS - enabled or not. In case, set to TRUE, GMMU will look for VA->PA - translations into both GMMU and ATS page tables. - ATS can be enabled or disabled per subcontext. - -#define NV_RAMIN_SC_ENABLE_ATS(i) ((170+(i)*4)*32+31):((170+(i)*4)*32+31) /* RWXUF */ - - NV_RAMIN_SC_PASID(i) identifies the PASID (process address space - ID) in CPU for subcontext i. PASID is used to get ATS - translation when ATS page table lookup is needed. During ATS TLB - shootdown, PASID is also used to match against the one coming with - shootdown request. - -#define NV_RAMIN_SC_PASID(i) ((170+(i)*4)*32+(20-1)):((170+(i)*4)*32+0) /* RWXUF */ - - - - -3 - FIFO CONTEXT RAM (RAMFC) -============================== - - - The NV_RAMFC part of a GPU-instance block contains Host's part of a virtual -GPU's state. Host is referred to as "FIFO". "FC" stands for FIFO Context. -When Host switches from serving one GPU context to serving a second, Host saves -state for the first GPU context to the first GPU context's RAMFC area, and loads -state for the second GPU context from the second GPU context's RAMFC area. - - RAMFC is located at NV_RAMIN_RAMFC within the GPU instance block. In -Kepler, this is at the start of the block. RAMFC is 4KB aligned. - - Every Host word entry in RAMFC directly corresponds to a PRI-accessible -register. For a description of the contents of a RAMFC entry, please see the -description of the corresponding register in "manuals/dev_pbdma.ref". The -offsets of the fields within each entry in RAMFC match those of the -corresponding register in the associated PBDMA unit's PRI space. - - - RAMFC Entry PBDMA Register - ------------------------------- ---------------------------------- - NV_RAMFC_SIGNATURE NV_PPBDMA_SIGNATURE(i) - NV_RAMFC_GP_BASE NV_PPBDMA_GP_BASE(i) - NV_RAMFC_GP_BASE_HI NV_PPBDMA_GP_BASE_HI(i) - NV_RAMFC_GP_FETCH NV_PPBDMA_GP_FETCH(i) - NV_RAMFC_GP_GET NV_PPBDMA_GP_GET(i) - NV_RAMFC_GP_PUT NV_PPBDMA_GP_PUT(i) - NV_RAMFC_PB_FETCH NV_PPBDMA_PB_FETCH(i) - NV_RAMFC_PB_FETCH_HI NV_PPBDMA_PB_FETCH_HI(i) - NV_RAMFC_PB_GET NV_PPBDMA_GET(i) - NV_RAMFC_PB_GET_HI NV_PPBDMA_GET_HI(i) - NV_RAMFC_PB_PUT NV_PPBDMA_PUT(i) - NV_RAMFC_PB_PUT_HI NV_PPBDMA_PUT_HI(i) - NV_RAMFC_PB_TOP_LEVEL_GET NV_PPBDMA_TOP_LEVEL_GET(i) - NV_RAMFC_PB_TOP_LEVEL_GET_HI NV_PPBDMA_TOP_LEVEL_GET_HI(i) - NV_RAMFC_GP_CRC NV_PPBDMA_GP_CRC(i) - NV_RAMFC_PB_HEADER NV_PPBDMA_PB_HEADER(i) - NV_RAMFC_PB_COUNT NV_PPBDMA_PB_COUNT(i) - NV_RAMFC_PB_CRC NV_PPBDMA_PB_CRC(i) - NV_RAMFC_SUBDEVICE NV_PPBDMA_SUBDEVICE(i) - NV_RAMFC_METHOD0 NV_PPBDMA_METHOD0(i) - NV_RAMFC_METHOD1 NV_PPBDMA_METHOD1(i) - NV_RAMFC_METHOD2 NV_PPBDMA_METHOD2(i) - NV_RAMFC_METHOD3 NV_PPBDMA_METHOD3(i) - NV_RAMFC_DATA0 NV_PPBDMA_DATA0(i) - NV_RAMFC_DATA1 NV_PPBDMA_DATA1(i) - NV_RAMFC_DATA2 NV_PPBDMA_DATA2(i) - NV_RAMFC_DATA3 NV_PPBDMA_DATA3(i) - NV_RAMFC_TARGET NV_PPBDMA_TARGET(i) - NV_RAMFC_METHOD_CRC NV_PPBDMA_METHOD_CRC(i) - NV_RAMFC_REF NV_PPBDMA_REF(i) - NV_RAMFC_RUNTIME NV_PPBDMA_RUNTIME(i) - NV_RAMFC_SEM_ADDR_LO NV_PPBDMA_SEM_ADDR_LO(i) - NV_RAMFC_SEM_ADDR_HI NV_PPBDMA_SEM_ADDR_HI(i) - NV_RAMFC_SEM_PAYLOAD_LO NV_PPBDMA_SEM_PAYLOAD_LO(i) - NV_RAMFC_SEM_PAYLOAD_HI NV_PPBDMA_SEM_PAYLOAD_HI(i) - NV_RAMFC_SEM_EXECUTE NV_PPBDMA_SEM_EXECUTE(i) - NV_RAMFC_ACQUIRE_DEADLINE NV_PPBDMA_ACQUIRE_DEADLINE(i) - NV_RAMFC_ACQUIRE NV_PPBDMA_ACQUIRE(i) - NV_RAMFC_MEM_OP_A NV_PPBDMA_MEM_OP_A(i) - NV_RAMFC_MEM_OP_B NV_PPBDMA_MEM_OP_B(i) - NV_RAMFC_MEM_OP_C NV_PPBDMA_MEM_OP_C(i) - NV_RAMFC_USERD NV_PPBDMA_USERD(i) - NV_RAMFC_USERD_HI NV_PPBDMA_USERD_HI(i) - NV_RAMFC_HCE_CTRL NV_PPBDMA_HCE_CTRL(i) - NV_RAMFC_CONFIG NV_PPBDMA_CONFIG(i) - NV_RAMFC_SET_CHANNEL_INFO NV_PPBDMA_SET_CHANNEL_INFO(i) - ------------------------------- ---------------------------------- - -#define NV_RAMFC /* ----G */ -#define NV_RAMFC_GP_PUT (0*32+31):(0*32+0) /* RWXUF */ -#define NV_RAMFC_MEM_OP_A (1*32+31):(1*32+0) /* RWXUF */ -#define NV_RAMFC_USERD (2*32+31):(2*32+0) /* RWXUF */ -#define NV_RAMFC_USERD_HI (3*32+31):(3*32+0) /* RWXUF */ -#define NV_RAMFC_SIGNATURE (4*32+31):(4*32+0) /* RWXUF */ -#define NV_RAMFC_GP_GET (5*32+31):(5*32+0) /* RWXUF */ -#define NV_RAMFC_PB_GET (6*32+31):(6*32+0) /* RWXUF */ -#define NV_RAMFC_PB_GET_HI (7*32+31):(7*32+0) /* RWXUF */ -#define NV_RAMFC_PB_TOP_LEVEL_GET (8*32+31):(8*32+0) /* RWXUF */ -#define NV_RAMFC_PB_TOP_LEVEL_GET_HI (9*32+31):(9*32+0) /* RWXUF */ -#define NV_RAMFC_REF (10*32+31):(10*32+0) /* RWXUF */ -#define NV_RAMFC_RUNTIME (11*32+31):(11*32+0) /* RWXUF */ -#define NV_RAMFC_ACQUIRE (12*32+31):(12*32+0) /* RWXUF */ -#define NV_RAMFC_ACQUIRE_DEADLINE (13*32+31):(13*32+0) /* RWXUF */ -#define NV_RAMFC_SEM_ADDR_HI (14*32+31):(14*32+0) /* RWXUF */ -#define NV_RAMFC_SEM_ADDR_LO (15*32+31):(15*32+0) /* RWXUF */ -#define NV_RAMFC_SEM_PAYLOAD_LO (16*32+31):(16*32+0) /* RWXUF */ -#define NV_RAMFC_SEM_EXECUTE (17*32+31):(17*32+0) /* RWXUF */ -#define NV_RAMFC_GP_BASE (18*32+31):(18*32+0) /* RWXUF */ -#define NV_RAMFC_GP_BASE_HI (19*32+31):(19*32+0) /* RWXUF */ -#define NV_RAMFC_GP_FETCH (20*32+31):(20*32+0) /* RWXUF */ -#define NV_RAMFC_PB_FETCH (21*32+31):(21*32+0) /* RWXUF */ -#define NV_RAMFC_PB_FETCH_HI (22*32+31):(22*32+0) /* RWXUF */ -#define NV_RAMFC_PB_PUT (23*32+31):(23*32+0) /* RWXUF */ -#define NV_RAMFC_PB_PUT_HI (24*32+31):(24*32+0) /* RWXUF */ -#define NV_RAMFC_MEM_OP_B (25*32+31):(25*32+0) /* RWXUF */ -#define NV_RAMFC_RESERVED26 (26*32+31):(26*32+0) /* RWXUF */ -#define NV_RAMFC_RESERVED27 (27*32+31):(27*32+0) /* RWXUF */ -#define NV_RAMFC_RESERVED28 (28*32+31):(28*32+0) /* RWXUF */ -#define NV_RAMFC_GP_CRC (29*32+31):(29*32+0) /* RWXUF */ -#define NV_RAMFC_PB_HEADER (33*32+31):(33*32+0) /* RWXUF */ -#define NV_RAMFC_PB_COUNT (34*32+31):(34*32+0) /* RWXUF */ -#define NV_RAMFC_SUBDEVICE (37*32+31):(37*32+0) /* RWXUF */ -#define NV_RAMFC_PB_CRC (38*32+31):(38*32+0) /* RWXUF */ -#define NV_RAMFC_SEM_PAYLOAD_HI (39*32+31):(39*32+0) /* RWXUF */ -#define NV_RAMFC_MEM_OP_C (40*32+31):(40*32+0) /* RWXUF */ -#define NV_RAMFC_RESERVED20 (41*32+31):(41*32+0) /* RWXUF */ -#define NV_RAMFC_RESERVED21 (42*32+31):(42*32+0) /* RWXUF */ -#define NV_RAMFC_TARGET (43*32+31):(43*32+0) /* RWXUF */ -#define NV_RAMFC_METHOD_CRC (44*32+31):(44*32+0) /* RWXUF */ -#define NV_RAMFC_METHOD0 (48*32+31):(48*32+0) /* RWXUF */ -#define NV_RAMFC_DATA0 (49*32+31):(49*32+0) /* RWXUF */ -#define NV_RAMFC_METHOD1 (50*32+31):(50*32+0) /* RWXUF */ -#define NV_RAMFC_DATA1 (51*32+31):(51*32+0) /* RWXUF */ -#define NV_RAMFC_METHOD2 (52*32+31):(52*32+0) /* RWXUF */ -#define NV_RAMFC_DATA2 (53*32+31):(53*32+0) /* RWXUF */ -#define NV_RAMFC_METHOD3 (54*32+31):(54*32+0) /* RWXUF */ -#define NV_RAMFC_DATA3 (55*32+31):(55*32+0) /* RWXUF */ -#define NV_RAMFC_HCE_CTRL (57*32+31):(57*32+0) /* RWXUF */ -#define NV_RAMFC_CONFIG (61*32+31):(61*32+0) /* RWXUF */ -#define NV_RAMFC_SET_CHANNEL_INFO (63*32+31):(63*32+0) /* RWXUF */ - -#define NV_RAMFC_BASE_SHIFT 12 /* */ - - Size of the full range of RAMFC in bytes. -#define NV_RAMFC_SIZE_VAL 0x00000200 /* ----C */ - -4 - USER-DRIVER ACCESSIBLE RAM (RAMUSERD) -========================================= - - A user-level driver is allowed to access only a small portion of a GPU -context's state. The portion of a GPU context's state that a user-level driver -can access is stored in a block of memory called NV_RAMUSERD. NV_RAMUSERD is a -user-level driver's window into NV_RAMFC. The NV_RAMUSERD state for each GPU -context is stored in an aligned NV_RAMUSERD_CHAN_SIZE-byte block of memory. - - To submit more methods, a user driver writes a PB segment to -memory, writes a GP entry that points to the PB segment, updates GP_PUT in -RAMUSERD, and writes the channel's handle to the -NV_USERMODE_NOTIFY_CHANNEL_PENDING register (see dev_usermode.ref). - - The RAMUSERD data structure is updated at regular intervals as controlled -by the NV_PFIFO_USERD_WRITEBACK setting (see dev_fifo.ref). For a particular -channel, RAMUSERD writeback can be disabled and it is reccomended that SW track -pushbuffer and channel progress via Host WFI_DIS semaphores rather than reading -the RAMUSERD data structure. - - When write-back is enabled a user driver can check the GPU progress in -executing a channel's PB segments. The driver can use: - * GP_GET to monitor the index of the next GP entry the GPU will process - * PB_GET to monitor the address of the next PB entry the GPU will process - * TOP_LEVEL_GET (see NV_PPBDMA_TOP_LEVEL_GET) to monitor the address of the - next "top-level" (non-SUBROUTINE) PB entry the GPU will process - * REF to monitor the current "reference count" value see NV_PPBDMA_REF. - - Each entry in RAMUSERD corresponds to a PRI-accessible PBDMA register in Host. -For a description of the behavior and contents of a RAMUSERD entry, please see -the description of the corresponding register in "manuals/dev_pbdma.ref". - - RAMUSERD Entry PBDMA Register Access - ------------------------------- ----------------------------- ---------- - NV_RAMUSERD_GP_PUT NV_PPBDMA_GP_PUT(i) Read/Write - NV_RAMUSERD_GP_GET NV_PPBDMA_GP_GET(i) Read-only - NV_RAMUSERD_GET NV_PPBDMA_GET(i) Read-only - NV_RAMUSERD_GET_HI NV_PPBDMA_GET_HI(i) Read-only - NV_RAMUSERD_PUT NV_PPBDMA_PUT(i) Read-only - NV_RAMUSERD_PUT_HI NV_PPBDMA_PUT_HI(i) Read-only - NV_RAMUSERD_TOP_LEVEL_GET NV_PPBDMA_TOP_LEVEL_GET(i) Read-only - NV_RAMUSERD_TOP_LEVEL_GET_HI NV_PPBDMA_TOP_LEVEL_GET_HI(i) Read-only - NV_RAMUSERD_REF NV_PPBDMA_REF(i) Read-only - ------------------------------- ----------------------------- ---------- - - A user driver may write to NV_RAMUSERD_GP_PUT to kick off more work in a -channel. Although writes to the other, read-only, entries can alter memory, -writes to those entries will not affect the operation of the GPU, and can be -overwritten by the GPU. - - When Host loads its part of a GPU context's state from RAMFC memory, it -may not immediately read RAMUSERD_GP_PUT. Host can use the GP_PUT values from -RAMFC directly from RAMFC while waiting for the RAMUSERD_GP_PUT to synchronize. -Because reads of RAMUSERD_GP_PUT can be delayed, the value in NV_PPBDMA_GP_PUT -can be older than the value in NV_RAMUSERD_GP_PUT. - - When Host saves a GPU context's state to NV_RAMFC, it also writes to -NV_RAMUSERD the values of the entries other than GP_PUT. -Because Host does not continuously write the read-only RAMFC entries, the -read-only values in USERD memory can be older than the values in the Host PBDMA -unit. - -#define NV_RAMUSERD /* ----G */ -#define NV_RAMUSERD_PUT (16*32+31):(16*32+0) /* RWXUF */ -#define NV_RAMUSERD_GET (17*32+31):(17*32+0) /* RWXUF */ -#define NV_RAMUSERD_REF (18*32+31):(18*32+0) /* RWXUF */ -#define NV_RAMUSERD_PUT_HI (19*32+31):(19*32+0) /* RWXUF */ -#define NV_RAMUSERD_TOP_LEVEL_GET (22*32+31):(22*32+0) /* RWXUF */ -#define NV_RAMUSERD_TOP_LEVEL_GET_HI (23*32+31):(23*32+0) /* RWXUF */ -#define NV_RAMUSERD_GET_HI (24*32+31):(24*32+0) /* RWXUF */ -#define NV_RAMUSERD_GP_GET (34*32+31):(34*32+0) /* RWXUF */ -#define NV_RAMUSERD_GP_PUT (35*32+31):(35*32+0) /* RWXUF */ -#define NV_RAMUSERD_BASE_SHIFT 9 /* */ -#define NV_RAMUSERD_CHAN_SIZE 512 /* */ - - - - -5 - RUN-LIST RAM (RAMRL) -======================== - - Software specifies the GPU contexts that hardware should "run" by writing a -list of entries (known as a "runlist") to a 4k-aligned area of memory (beginning -at NV_PFIFO_RUNLIST_BASE), and by notifying Host that a new list is available -(by writing to NV_PFIFO_RUNLIST). - Submission of a new runlist causes Host to expire the timeslice of all work -scheduled by the previous runlist, allowing it to schedule the channels present -in the new runlist once they are fetched. SW can check the status of the runlist -by polling NV_PFIFO_ENG_RUNLIST_PENDING. (see dev_fifo.ref NV_PFIFO_RUNLIST for -a full description of the runlist submit mechanism). - Runlists can be stored in system memory or video memory (as specified by -NV_PFIFO_RUNLIST_BASE_TARGET). If a runlist is stored in video memory, software -will have to execute flush or read the last entry written before submitting the -runlist to Host to guarantee coherency . - The size of a runlist entry data structure is 16 bytes. Each entry -specifies either a channel entry or a TSG header; the type is determined by the -NV_RAMRL_ENTRY_TYPE. - - -Runlist Channel Entry Type: - - A runlist entry of type NV_RAMRL_ENTRY_TYPE_CHAN specifies a channel to -run. All such entries must occur within the span of some TSG as specified by -the NV_RAMRL_ENTRY_TYPE_TSG described below. If a channel entry is encountered -outside a TSG, Host will raise the NV_PFIFO_INTR_SCHED_ERROR_CODE_BAD_TSG -interrupt. - - The fields available in a channel runlist entry are as follows (Fig 5.1): - - ENTRY_TYPE (T) : type of this entry: ENTRY_TYPE_CHAN - CHID (ID) : identifier of the channel to run (overlays ENTRY_ID) - RUNQUEUE_SELECTOR (Q) : selects which PBDMA should run this channel if - more than one PBDMA is supported by the runlist - - INST_PTR_LO : lower 20 bits of the 4k-aligned instance block pointer - INST_PTR_HI : upper 32 bit of instance block pointer - INST_TARGET (TGI) : aperture of the instance block - - USERD_PTR_LO : upper 24 bits of the low 32 bits, of the 512-byte-aligned USERD pointer - USERD_PTR_HI : upper 32 bits of USERD pointer - USERD_TARGET (TGU) : aperture of the USERD data structure - - CHID is a channel identifier that uniquely specifies the channel described -by this runlist entry to the scheduling hardware and is reported in various -status registers. - RUNQUEUE_SELECTOR determines to which runqueue the channel belongs, and -thereby which PBDMA will run the channel. Increasing values select increasingly -numbered PBDMA IDs serving the runlist. If the selector value exceeds the -number of PBDMAs on the runlist, the hardware will silently reassign the channel -to run on the first PBDMA as though RUNQUEUE_SELECTOR had been set to 0. (In -current hardware, this is used by SCG on the graphics runlist only to determine -which FE pipe should service a given channel. A value of 0 targets the first FE -pipe, which can process all FE driven engines: Graphics, Compute, Inline2Memory, -and TwoD. A value of 1 targets the second FE pipe, which can only process -Compute work. Note that GRCE work is allowed on either runqueue.) - The INST fields specify the physical address of the channel's instance -block, the in-memory data structure that stores the context state. -The target aperture of the instance block is given by INST_TARGET, and the byte -offset within that aperture is calculated as - - (INST_PTR_HI << 32) | (INST_PTR_LO << NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT) - -This address should match the one specified in the channel RAM's -NV_PCCSR_CHANNEL_INST register; see NV_RAMIN and NV_RAMFC for the format of the -instance block. The hardware ignores the RAMRL INST fields, but in future -chips the instance pointer may be removed from the channel RAM and the RAMRL -INST fields used instead, resulting in smaller hardware. - The USERD fields specify the physical address of the USERD memory region -used by software to submit additional work to the channel. The target aperture -of the USERD region is given by USERD_TARGET, and the byte offset within that -aperture is calculated as - - (USERD_PTR_HI << 32) | (USERD_PTR_LO << NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT) - - -SW uses the NV_RAMUSERD_CHAN_SIZE define to allocate and align a channel's -RAMUSERD data structure. See the documentation for NV_RAMUSERD for a -description of the use of USERD and its format. This address and it's -alignment must match the one specified in the RAMFC's NV_RAMFC_USERD and -NV_RAMFC_USERD_HI fields which are backed by NV_PPBDMA_USERD in dev_pbdma.ref. -The hardware ignores the RAMRL USERD fields, but in future chips the USERD -pointer may be read from these fields in the runlist entry instead of the RAMFC -to avoid the extra level of indirection in fetching the USERD data that -currently results in a dependent read. - - -Runlist TSG Entry Type: - - The other type of runlist entry is Timeslice Group (TSG) header entry -(Fig 5.2). This type of entry is specified by NV_RAMRL_ENTRY_TYPE_TSG. A TSG -entry describes a collection of channels all of which share the same context and -are scheduled as a single unit by Host. All runlists support this type of entry. - - The fields available in a TSG header runlist entry are as follows (Fig 5.2): - - ENTRY_TYPE (T) : type of this entry: ENTRY_TYPE_TSG - TSGID : identifier of the Timeslice group (overlays ENTRY_ID) - TSG_LENGTH : number of channels that are part of this timeslice group - TIMESLICE_SCALE : scale factor for the TSG's timeslice - TIMESLICE_TIMEOUT : timeout amount for the TSG's timeslice - - A timeslice group entry consists of an integer identifier along with a -length which specifies the number of channels in the TSG. After a TSG header -runlist entry, the next TSG_LENGTH runlist entries are considered to be part of -the timeslice group. Note that the minimum length of a TSG is at least one entry. - All channels in a TSG share the same runlist timeslice which specifies how -long a single context runs on an engine or PBDMA before being swapped for a -different context. The timeslice period is set in the TSG header by specifying -TSG_TIMESLICE_TIMEOUT and TSG_TIMESLICE_SCALE. The TSG timeslice period is -calculated as follows: - - timeslice = (TSG_TIMESLICE_TIMEOUT << TSG_TIMESLICE_SCALE) * 1024 nanoseconds - - The timeslice period should normally not be set to zero. A timeslice of -zero will be treated as a timeslice period of one . The runlist -timeslice period begins after the context has been loaded on a PBDMA but is -paused while the channel has an outstanding context load to an engine. Time -spent switching a context into an engine is not part of the runlist timeslice. - - If Host reaches the end of the runlist or receives another entry of type -NV_RAMRL_ENTRY_TYPE_TSG before processing TSG_LENGTH additional runlist entries, -or if it encounters a TSG of length 0, a SCHED_ERROR interrupt will be generated -with ERROR_CODE_BAD_TSG. - - -Host Scheduling Memory Layout: - -Example of graphics runlist entry to GPU context mapping via channel id: - - - .------Ints_ptr -------. - | | - Graphics Runlist | Channel-Map RAM | GPU Instance Block - .------------ . | .----------------. | .-------------------. - | TSG Hdr L=m |--.----' |Ch0 Inst Blk Ptr|--'------->| Host State | - | RL Entry T1 | | |Ch1 Inst Blk Ptr| .------| Memory State | - | RL Entry T2 | | | ... | | | Engine0 State Ptr | - | ... | |-chid->|ChI Inst Blk Ptr| | | Engine1 State Ptr | - | RL Entry Tm | | | ... | | | ... | - | TSG Hdr L=n | | |ChN Inst Blk Ptr| | .-| EngineN State Ptr | - | RL Entry T1 | | `----------------' | | `-------------------' - | RL Entry T2 |userd_ptr | | - | ... | | .--------------. | | .--------------. - | RL Entry Tn | | | USERD | | | | Engine Ctx | - | | '------->| |<----' '-->| State N | - `-------------' | | | | - `--------------' `--------------' - -Runlist Diagram Description: - Here we have (M+N) number of channel type (ENTRY_TYPE_CHID) runlist entries -grouped together within two TSGs. The first entry in the runlist is a TSG header -entry (ENTRY_TYPE_TSG) that describes the first TSG. The TSG header specifies m -as the length of the TSG. The header would also contain the timeslice -information for the TSG (SCALE/TIMEOUT), as well as the TSG id specified in the -TSGID field. - Because the length here is M, the Runlist *must* contain M additional -runlist entries of type ENTRY_TYPE_CHAN that will be part of this TSG. -Similarly, the next (N+1) number of entries, a TSG header entry followed by N -number of regular channel entry, correspond to the second TSG. - -#define NV_RAMRL_ENTRY /* ----G */ -#define NV_RAMRL_ENTRY_RANGE 0xF:0x00000000 /* RW--M */ -#define NV_RAMRL_ENTRY_SIZE 16 /* */ -// Runlist base must be 4k-aligned. -#define NV_RAMRL_ENTRY_BASE_SHIFT 12 /* */ - - -#define NV_RAMRL_ENTRY_TYPE (0+0*32):(0+0*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_TYPE_CHAN 0x00000000 /* RW--V */ -#define NV_RAMRL_ENTRY_TYPE_TSG 0x00000001 /* RW--V */ - -#define NV_RAMRL_ENTRY_ID (11+2*32):(0+2*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_ID_HW 11:0 /* RWXUF */ -#define NV_RAMRL_ENTRY_ID_MAX (4096-1) /* RW--V */ - - - - - -#define NV_RAMRL_ENTRY_CHAN_RUNQUEUE_SELECTOR (1+0*32):(1+0*32) /* RWXUF */ - -#define NV_RAMRL_ENTRY_CHAN_INST_TARGET (5+0*32):(4+0*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_VID_MEM 0x00000000 /* RW--V */ -#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */ -#define NV_RAMRL_ENTRY_CHAN_INST_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */ - -#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET (7+0*32):(6+0*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM 0x00000000 /* RW--V */ -#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_VID_MEM_NVLINK_COHERENT 0x00000001 /* RW--V */ -#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */ -#define NV_RAMRL_ENTRY_CHAN_USERD_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */ - -#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_LO (31+0*32):(8+0*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_HI (31+1*32):(0+1*32) /* RWXUF */ - -#define NV_RAMRL_ENTRY_CHAN_CHID (11+2*32):(0+2*32) /* RWXUF */ - -#define NV_RAMRL_ENTRY_CHAN_INST_PTR_LO (31+2*32):(12+2*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_CHAN_INST_PTR_HI (31+3*32):(0+3*32) /* RWXUF */ - - - -// Macros for shifting out low bits of INST_PTR and USERD_PTR. -#define NV_RAMRL_ENTRY_CHAN_INST_PTR_ALIGN_SHIFT 12 /* ----C */ -#define NV_RAMRL_ENTRY_CHAN_USERD_PTR_ALIGN_SHIFT 8 /* ----C */ - - - - - - - -#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE (19+0*32):(16+0*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_TSG_TIMESLICE_SCALE_3 0x00000003 /* RWI-V */ -#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT (31+0*32):(24+0*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_128 0x00000080 /* RWI-V */ - - -#define NV_RAMRL_ENTRY_TSG_TIMESLICE_TIMEOUT_1US 0x00000000 /* */ - -#define NV_RAMRL_ENTRY_TSG_LENGTH (7+1*32):(0+1*32) /* RWXUF */ -#define NV_RAMRL_ENTRY_TSG_LENGTH_INIT 0x00000000 /* RW--V */ -#define NV_RAMRL_ENTRY_TSG_LENGTH_MIN 0x00000001 /* RW--V */ -#define NV_RAMRL_ENTRY_TSG_LENGTH_MAX 0x00000080 /* RW--V */ - -#define NV_RAMRL_ENTRY_TSG_TSGID (11+2*32):(0+2*32) /* RWXUF */ - - - -6 - Host Pushbuffer Format (FIFO_DMA) -======================================= - - "FIFO" refers to Host. "FIFO_DMA" means data that Host reads from memory: -the pushbuffer. Host autonomously reads pushbuffer data from memory and -generates method address/data pairs from the data. - - Pushbuffer terminology: - - - A channel is the logical sequence of instructions associated with a GPU - context. - - - The pushbuffer is a stream of data in memory containing the - specifications of the operations that a channel is to perform for a - particular client. Pushbuffer data consists of pushbuffer entries. - - - A pushbuffer entry (PB entry) is a 32-bit (doubleword) sized unit of - pushbuffer data. This is the smallest granularity at which Host consumes - pushbuffer data. A PB entry is either a PB instruction (which is either - a PB control entry or a PB method header), or a method data entry. - - - A pushbuffer segment (PB segment) is a contiguous block of memory - containing pushbuffer entries. The location and size of a pushbuffer - segment is defined by its respective GP entry in the GPFIFO. - - - A pushbuffer control entry (PB control entry) is a single PB entry of - type SET_SUBDEVICE_MASK, STORE_SUBDEVICE_MASK, USE_SUBDEVICE_MASK, - END_PB_SEGMENT, or a universal NOP (NV_FIFO_DMA_NOP). - - - A pushbuffer compressed method sequence is a sequence of pushbuffer - entries starting with a method header and a variable-length sequence of - method data entries (the length being defined by the method header). A - single PB compressed method sequence expands into one or more methods. - This may also be known as a "pushbuffer method" (PB method), but that - terminology is ambiguous and not preferred. - - - A pushbuffer method header (PB method header) is the first PB entry found - in a PB compressed method sequence. A PB method header is a PB - instruction performed on method data entries. - - - A pushbuffer instruction (PB instruction) is a PB entry that is not a PB - method data entry. A PB instruction is either a PB control entry or a PB - method header. - - - A method is an address/data pair representing an operation to perform. - - - A method data entry is the 32-bit operand for its corresponding method. - - - -#define NV_FIFO_PB_ENTRY_SIZE 4 /* */ - - - Some engines such as Graphics internally support a double-wide method FIFO; -these are known as "data-hi" methods. It is Host that performs the packing of -two methods into one double-wide entry. Host will only generate data-hi methods -if the following conditions are satisfied: - - 1. The two methods come from the same PB method (in other words they share - the same method header). - - 2. The method header specifies a non-incrementing method, an incrementing - method, or an increment-once method. - - 3. The paired methods either have the same method address, or the first - method has an even NV_FIFO_DMA_METHOD_ADDRESS field and the second - (data-hi) method is the increment of the first. (That is, the - left-shifted method address as listed in the class files must be - divisible by 8 for this condition to hold.) - - 4. The second method is available at the time of pushing the first one into - the engine's method FIFO. In other words, Host will not wait to pack - methods. Note that if the engine's method fifo is full, the - back-pressure will in itself create a "wait time". - -The first three conditions are under SW's control. Only the graphics engine -supports data-hi methods. - - -Types of PB Entries - - PB entries can be classified into three types: PB method headers, PB -control entries, and PB method data. Different types of PB entries have -different formats. Because PB compressed method sequences are of variable -length, it is impossible to determine the type of a PB entry without tracking -the pushbuffer from the beginning or from the location of a PB entry that is -known to not be a PB method data entry. - - A PB method data entry is always found in a method data sequence -immediately following a PB method header in the logical stream of PB entries. -The PB method header contains a NV_FIFO_DMA_METHOD_COUNT field, the value of -which is equal to the length of the method data sequence. Note a PB method -header does not necessarily come with PB method data entries (see details below -about immediate-data method headers and method headers for which COUNT is zero). -Also note the PB method data entries may be located in a PB segment separate -from their corresponding method header. The format of any given PB method data -entry is defined in the "NV_UDMA" section of dev_pbdma.ref. - - A PB entry that is either a PB method header or PB control entry is known -as a PB instruction. The type of a PB instruction is specified by the -NV_FIFO_DMA_SEC_OP field and the NV_FIFO_DMA_TERT_OP field. - - secondary tertiary - opcode opcode entry type - --------- -------- -------------------------------- - 000 01 SET_SUBDEVICE_MASK - 000 10 STORE_SUBDEVICE_MASK - 000 11 USE_SUBDEVICE_MASK - 001 xx incrementing method header - 011 xx non-incrementing method header - 100 xx immediate-data method header - 101 xx increment-once method header - 111 xx END_PB_SEGMENT - --------- -------- -------------------------------- - - Types of methods: - - - A Host method is a method whose address is defined in the NV_UDMA device - range. - - - A Host-only method is any Host method excluding SetObject (also known as - NV_UDMA_OBJECT). - - - An engine method is a method whose address is not defined within the - NV_UDMA device range. There are multiple engines designated by a - subchannel ID. Software methods are included in this category. - - - A software method (SW method) is a method which causes an interrupt for - the express purpose of being handled by software. For details see the - section on software methods below. - - For more information about types of methods see "HOST METHODS" and -"RESERVED METHOD ADDRESSES" in dev_pbdma.ref. - - The method address in a PB method header (stored in the -NV_FIFO_DMA_METHOD_ADDRESS field) is a dword-address, not a byte-address. In -other words the least significant two bits of the address are not stored because -the byte-address is dword-aligned (thus the least significant two bits are -always zero). - - The subchannel in a PB method header (stored in the -NV_FIFO_DMA_*_SUBCHANNEL field) determines the engine to which a method will be -sent if the method is SetObject or an engine method (otherwise, the SUBCHANNEL -field is ignored). SetObject enables SW to request HW to check the expectation -that a given subchannel serves the specified class ID; see the description of -"NV_UDMA_OBJECT" in dev_pbdma.ref. - - The mapping between subchannels and engines is fixed. A subchannel is -bound to a given class according to the runlist. Each engine method is applied -to an "object," which itself is an instance of an NV class as defined by the -master MFS class files. Each object belongs to an engine. For SetObject and -engine methods, the engine is determined entirely by the SUBCHANNEL field of -the method's header via a fixed mapping that depends on the runlist on which the -method arrives. - - Methods on subchannels 0-4 are handled by the primary engine served by the -runlist, except that subchannel 4 targets GRCOPY0 and GRCOPY1 on the graphics -runlist. For Graphics/Compute, SetObject associates subchannels 0, 1, 2, and 3 -with class identifiers for 3D, compute, I2M, and 2D respectively. On other -runlists, the subchannel is ignored, and Host does not send the subchannel ID to -the engine. It is recommended that SW only use subchannel 4 on the dedicated -copy engines for consistency with GRCOPY usage. - - Subchannels 5-7 are for software methods. Any methods on these subchannels -(including SetObject methods) are kicked back to software for handling via the -SW method dispatch mechanism using the NV_PPBDMA_INTR_*_DEVICE interrupt. SW -may choose to send a SetObject method to each engine subchannel before sending -any methods on that particular subchannel in order to support multiple software -classes. - - If a method stream subchannel-switches from targeting graphics/compute to a -copy engine or vice-versa, that is, to or from subchannel 4 on GR, Host will: - - 1. Wait until the first engine has completed all its methods, - - 2. Wait until that engine indicates that it is idle (WFI), and - - 3. Send a sysmem barrier flush and wait until it completes. - -Only then will Host send methods to the newly targeted engine. - - Note that this WFI will not occur for sending Host-only methods on the new -subchannel, since Host-only methods ignore the subchannel field. Additionally, -when switching from CE to graphics/compute, Host forces FE to perform a cache -invalidate. Other subchannel switch semantics may be provided by the engines -themselves, such as switching between subchannels 0-3 within FE. - - -#define NV_FIFO_DMA /* ----G */ -#define NV_FIFO_DMA_METHOD_ADDRESS_OLD 12:2 /* RWXUF */ -#define NV_FIFO_DMA_METHOD_ADDRESS 11:0 /* RWXUF */ - -#define NV_FIFO_DMA_SUBDEVICE_MASK 15:4 /* RWXUF */ - -#define NV_FIFO_DMA_METHOD_SUBCHANNEL 15:13 /* RWXUF */ - -#define NV_FIFO_DMA_TERT_OP 17:16 /* RWXUF */ -#define NV_FIFO_DMA_TERT_OP_GRP0_SET_SUB_DEV_MASK 0x00000001 /* RW--V */ -#define NV_FIFO_DMA_TERT_OP_GRP0_STORE_SUB_DEV_MASK 0x00000002 /* RW--V */ -#define NV_FIFO_DMA_TERT_OP_GRP0_USE_SUB_DEV_MASK 0x00000003 /* RW--V */ - -#define NV_FIFO_DMA_METHOD_COUNT_OLD 28:18 /* RWXUF */ -#define NV_FIFO_DMA_METHOD_COUNT 28:16 /* RWXUF */ -#define NV_FIFO_DMA_IMMD_DATA 28:16 /* RWXUF */ - -#define NV_FIFO_DMA_SEC_OP 31:29 /* RWXUF */ -#define NV_FIFO_DMA_SEC_OP_GRP0_USE_TERT 0x00000000 /* RW--V */ -#define NV_FIFO_DMA_SEC_OP_INC_METHOD 0x00000001 /* RW--V */ -#define NV_FIFO_DMA_SEC_OP_NON_INC_METHOD 0x00000003 /* RW--V */ -#define NV_FIFO_DMA_SEC_OP_IMMD_DATA_METHOD 0x00000004 /* RW--V */ -#define NV_FIFO_DMA_SEC_OP_ONE_INC 0x00000005 /* RW--V */ -#define NV_FIFO_DMA_SEC_OP_RESERVED6 0x00000006 /* RW--V */ -#define NV_FIFO_DMA_SEC_OP_END_PB_SEGMENT 0x00000007 /* RW--V */ - - -Incrementing PB Method Header Format - - An incrementing PB method header specifies that Host generate a sequence of -methods. The length of the sequence is defined by the method header. The -method data for each method in this sequence is found in a sequence of PB -entries immediately following the method header. - - The dword-address of the first method is specified by the method header, -and the dword-address of each subsequent method is equal to the dword-address of -the previous method plus one. Or in other words, the byte-address of each -subsequent method is equal to the byte-address of the previous method plus four. - -Example sequence of methods generated from an incrementing method header: - - addr data0 - addr+1 data1 - addr+2 data2 - addr+3 data3 - ... ... - - The NV_FIFO_DMA_INCR_COUNT field contains the number of methods in the -generated sequence. This is the same as the number of method data entries that -follow the method header. If the COUNT field is zero, the other fields are -ignored, and the PB method effectively becomes a no-op with no method data -entries following it. - - The NV_FIFO_DMA_INCR_SUBCHANNEL field contains the subchannel to use for -the methods generated from the method header. See the documentation above for -NV_FIFO_DMA_*_SUBCHANNEL. - - The NV_FIFO_DMA_INCR_ADDRESS field contains the method address for the -first method in the generated sequence. The dword-address of the method is -incremented by one each time a method is generated. A method address specifies -an operation to be performed. Note that because the ADDRESS is a dword-address -and not a byte-address, the least two significant bits of the method's -byte-address are not stored. - - The NV_FIFO_DMA_INCR_DATA fields contain the method data for the methods in -the generated sequence. The number of method data entries is defined by the -COUNT field. A method data entry contains an operand for its respective method. - - Bit 12 is reserved for the future expansion of either the subchannel or the -address fields. - - -#define NV_FIFO_DMA_INCR /* ----G */ -#define NV_FIFO_DMA_INCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */ -#define NV_FIFO_DMA_INCR_OPCODE_VALUE 0x00000001 /* ----V */ -#define NV_FIFO_DMA_INCR_COUNT (0*32+28):(0*32+16) /* RWXUF */ -#define NV_FIFO_DMA_INCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */ -#define NV_FIFO_DMA_INCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */ -#define NV_FIFO_DMA_INCR_DATA (1*32+31):(1*32+0) /* RWXUF */ - - -Non-Incrementing PB Method Header Format - - A non-incrementing PB method header specifies that Host generate a sequence -of methods. The length of the sequence is defined by the method header. The -method data for each method in this sequence is contained within the PB entries -immediately following the method header. - - Unlike with the incrementing PB method header, the sequence of methods -generated all have the same method address. The dword-address of every method -in this sequence is specified by the method header. Although the methods all -have the same address, the method data entries may be different. - -Example sequence of methods generated from a non-incrementing method header: - - addr data0 - addr data1 - addr data2 - addr data3 - ... ... - - The NV_FIFO_DMA_NONINCR_COUNT field contains the number of methods -in the generated sequence. This is the same as the number of method data -entries that follow the method header. If the COUNT field is zero, the other -fields are ignored, and the PB method effectively becomes a no-op with no method -data entries following it. - - The NV_FIFO_DMA_NONINCR_SUBCHANNEL field contains the subchannel to use for -the methods generated from the method header. See the documentation above for -NV_FIFO_DMA_*_SUBCHANNEL. - - The NV_FIFO_DMA_NONINCR_ADDRESS field contains the method address for every -method in the generated sequence. A method address specifies an operation to be -performed. Note that because the ADDRESS field is a dword-address and not a -byte-address, the least two significant bits of the method's byte-address are -not stored. - - The NV_FIFO_DMA_NONINCR_DATA fields contain the method data for the methods -in the generated sequence. The number of method data entries is defined by the -COUNT field. A method data entry contains an operand for its respective method. - - Bit 12 is reserved for the future expansion of either the subchannel or the -address fields. - - -#define NV_FIFO_DMA_NONINCR /* ----G */ -#define NV_FIFO_DMA_NONINCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */ -#define NV_FIFO_DMA_NONINCR_OPCODE_VALUE 0x00000003 /* ----V */ -#define NV_FIFO_DMA_NONINCR_COUNT (0*32+28):(0*32+16) /* RWXUF */ -#define NV_FIFO_DMA_NONINCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */ -#define NV_FIFO_DMA_NONINCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */ -#define NV_FIFO_DMA_NONINCR_DATA (1*32+31):(1*32+0) /* RWXUF */ - - -Increment-Once PB Method Header Format - - An increment-once PB method header specifies that Host generate a sequence -of methods. The length of the sequence is defined by the method header. The -method data for each method in this sequence is found in a sequence of PB -entries immediately following the method header. - - The dword-address of the first method is specified by the method header. -The address of the second and all following methods is equal to the -dword-address of the first method plus one. In other words, the byte-address of -the second and all following methods is equal to the byte-address of the first -method plus four. - -Example sequence of methods generated from an increment-once method header: - - addr data0 - addr+1 data1 - addr+1 data2 - addr+1 data3 - ... ... - - The NV_FIFO_DMA_ONEINCR_COUNT field contains the number of methods in the -generated sequence. This is the same as the number of method data entries that -follow the method header. If the COUNT field is zero, the other fields are -ignored, and the PB method effectively becomes a no-op method with no method -data entries following it. - - The NV_FIFO_DMA_ONEINCR_SUBCHANNEL field contains the subchannel to use for -the methods generated from the method header. See the documentation above for -NV_FIFO_DMA_*_SUBCHANNEL. - - The NV_FIFO_DMA_ONEINCR_ADDRESS field contains the method address for the -first method in the generated sequence. A method address specifies an operation -to be performed. Note that because the ADDRESS is a dword-address and not a -byte-address, the least two significant bits of the method's byte-address are -not stored. - - The NV_FIFO_DMA_ONEINCR_DATA fields contain the method data for the methods -in the generated sequence. The number of method data entries is defined by the -COUNT field. A method data entry contains an operand for its respective method. - - Bit 12 is reserved for the future expansion of either the subchannel or the -address fields. - - -#define NV_FIFO_DMA_ONEINCR /* ----G */ -#define NV_FIFO_DMA_ONEINCR_OPCODE (0*32+31):(0*32+29) /* RWXUF */ -#define NV_FIFO_DMA_ONEINCR_OPCODE_VALUE 0x00000005 /* ----V */ -#define NV_FIFO_DMA_ONEINCR_COUNT (0*32+28):(0*32+16) /* RWXUF */ -#define NV_FIFO_DMA_ONEINCR_SUBCHANNEL (0*32+15):(0*32+13) /* RWXUF */ -#define NV_FIFO_DMA_ONEINCR_ADDRESS (0*32+11):(0*32+0) /* RWXUF */ -#define NV_FIFO_DMA_ONEINCR_DATA (1*32+31):(1*32+0) /* RWXUF */ - - -No-Operation PB Instruction Formats - - The method header for a no-op PB method may be specified in multiple ways, -but the preferred way is to set the PB instruction to NV_FIFO_DMA_NOP. -In any case NV_FIFO_DMA_NOP is a universal NOP entry that bypasses any method -header format check, and is not considered a method header. - - -#define NV_FIFO_DMA_NOP 0x00000000 /* ----C */ - - -Immediate-Data PB Method Header Format - - If a method's operand fits within 13 bits, a PB method may be specified in -a single PB entry, using the immediate-data PB method header format. Exactly -one method is generated from this method header. - - The NV_FIFO_DMA_IMMD_SUBCHANNEL field contains the subchannel to use for -the method generated from the method header. See the documentation above for -NV_FIFO_DMA_*_SUBCHANNEL. - - The NV_FIFO_DMA_IMMD_ADDRESS field contains the method address for the -single generated method. A method address specifies an operation to be -performed. Note that because the ADDRESS is a dword-address and not a -byte-address, the least two significant bits of the method's byte-address are -not stored. - - The single NV_FIFO_DMA_IMMD_DATA field contains the method data for the -generated method. This method data contains an operand for the generated -method. - - -#define NV_FIFO_DMA_IMMD /* ----G */ -#define NV_FIFO_DMA_IMMD_ADDRESS 11:0 /* RWXUF */ -#define NV_FIFO_DMA_IMMD_SUBCHANNEL 15:13 /* RWXUF */ -#define NV_FIFO_DMA_IMMD_DATA 28:16 /* RWXUF */ -#define NV_FIFO_DMA_IMMD_OPCODE 31:29 /* RWXUF */ -#define NV_FIFO_DMA_IMMD_OPCODE_VALUE 0x00000004 /* ----V */ - - -Set Sub-Device Mask PB Control Entry Format - - The SET_SUBDEVICE_MASK (SSDM) PB control entry is used when multiple GPU -contexts are using the same pushbuffer (for example, for SLI or for stereo -rendering) and there is data in the push buffer that is for only a subset of the -GPU contexts. This instruction allows the pushbuffer to tell a specific GPU -context to use or ignore methods following the SET_SUBDEVICE_MASK. While the -logical-AND of NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE and the GPU context's -NV_PPBDMA_SUBDEVICE_ID value is zero, methods are ignored. Pushbuffer control -entries (like SET_SUBDEVICE_MASK) are not ignored. - -******************************************************************************** -Warning: When using subdevice masking, one must take care to synchronize -properly with any later GP entries marked FETCH_CONDITIONAL. If GP fetching -gets too far ahead of PB processing, it is possible for a later conditional PB -segment to be discarded prior to reaching an SSDM command that sets -SUBDEVICE_STATUS to ACTIVE. This would cause Host to execute garbage data. One -way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL -segments following a subdevice reenable. -******************************************************************************** - - - -#define NV_FIFO_DMA_SET_SUBDEVICE_MASK /* ----G */ -#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_VALUE 15:4 /* RWXUF */ -#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */ -#define NV_FIFO_DMA_SET_SUBDEVICE_MASK_OPCODE_VALUE 0x00000001 /* ----V */ - - -Store Sub-Device Mask PB Control Entry Format - - The STORE_SUBDEVICE_MASK PB control entry is used to save a subdevice mask -value to be used later by a USE_SUBDEVICE_MASK PB instruction. - - -#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK /* ----G */ -#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_VALUE 15:4 /* RWXUF */ -#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */ -#define NV_FIFO_DMA_STORE_SUBDEVICE_MASK_OPCODE_VALUE 0x00000002 /* ----V */ - - -Use Sub-Device Mask PB Control Entry Format - - The USE_SUBDEVICE_MASK PB control entry is used to apply the subdevice mask -value saved by a STORE_SUBDEVICE_MASK PB instruction. The effect of the mask is -the same as for a SET_SUBDEVICE_MASK PB instruction. - - -#define NV_FIFO_DMA_USE_SUBDEVICE_MASK /* ----G */ -#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE 31:16 /* RWXUF */ -#define NV_FIFO_DMA_USE_SUBDEVICE_MASK_OPCODE_VALUE 0x00000003 /* ----V */ - - -End-PB-Segment PB Control Entry Format - - Engines may write PB segments themselves, but they cannot write GP entries. -Because they cannot write GP entries, they cannot alter the size of a PB -segment. If an engine is writing a PB segment, and if it does not need to fill -the entire PB segment it was allocated, instead of filling the remainder of the -PB segment with no-op PB instructions, it may write a single End-PB-Segment -control entry to indicate that the pushbuffer data contains no further valid -data. No further PB entries from that PB segment will be decoded or processed. -Host may have already issued requests to fetch the remainder of the PB segment -before an End-PB-Segment PB instruction is processed. Host may or may not fetch -the remainder of the PB segment. Also note that doing a PB CRC check on this -segment via NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC will be indeterminate. - - -#define NV_FIFO_DMA_ENDSEG_OPCODE 31:29 /* RWXUF */ -#define NV_FIFO_DMA_ENDSEG_OPCODE_VALUE 0x00000007 /* ----V */ - - -- cgit v1.2.3