diff options
Diffstat (limited to 'manuals/volta/gv100/dev_pbdma.ref.txt')
-rw-r--r-- | manuals/volta/gv100/dev_pbdma.ref.txt | 4261 |
1 files changed, 4261 insertions, 0 deletions
diff --git a/manuals/volta/gv100/dev_pbdma.ref.txt b/manuals/volta/gv100/dev_pbdma.ref.txt new file mode 100644 index 0000000..bc5163a --- /dev/null +++ b/manuals/volta/gv100/dev_pbdma.ref.txt @@ -0,0 +1,4261 @@ +Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. + +Permission is hereby granted, free of charge, to any person obtaining a +copy of this software and associated documentation files (the "Software"), +to deal in the Software without restriction, including without limitation +the rights to use, copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL +THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. +-------------------------------------------------------------------------------- + +1 - INTRODUCTION +================== + + A Host's PBDMA unit fetches pushbuffer data from memory, generates +commands, called "methods", from the fetched data, executes some of the +generated methods itself, and sends the remainder of the methods to engines. + This manual describes the Host PBDMA register space and all Host methods. +The NV_PPBDMA space defines registers that are contained within each of Host's +PBDMA units. Each PBDMA unit is allocated a 8KB address space for its +registers. + The NV_UDMA space defines the Host methods. A method consists of an +address doubleword and a data doubleword. The address specifies the operation +to be performed. The data is an operand. The NV_UDMA address space contains +the addresses of the methods that are executed by a PBDMA unit. +GP_ENTRY0 and GP_ENTRY1 - GP-Entry Memory Format + + A pushbuffer contains the specifications of the operations that a GPU +context is to perform for a particular client. Pushbuffers are stored in +memory. A doubleword-sized (4-byte) unit of pushbuffer data is known as a +pushbuffer entry. GP entries indicate the location of the pushbuffer data in +memory. GP entries themselves are also stored in memory. + A GP entry specifies the location and size of a pushbuffer segment (a +contiguous block of PB entries) in memory. See "FIFO_DMA" in dev_ram.ref for +details about pushbuffer segments and the format of pushbuffer data. + + The NV_PPBDMA_GP_ENTRY0_GET and NV_PPBDMA_GP_ENTRY1_GET_HI fields of a GP +entry specify the 38-bit dword-address (which would make a 40-bit byte-address) +of the first pushbuffer entry of the GP entry's pushbuffer segment. Because +each pushbuffer entry (and by extension each pushbuffer segment) is doubleword +aligned (4-byte aligned), the least significant 2 bits of the 40-bit +byte-address are not stored. The byte-address of the first pushbuffer entry in +a GP entry's pushbuffer segment is +(GP_ENTRY1_GET_HI << 32) + (GP_ENTRY0_GET << 2). + The NV_PPBDMA_GP_ENTRY1_LENGTH field, when non-zero, indicates the number +of pushbuffer entries contained within the GP entry's pushbuffer segment. The +byte-address of the first pushbuffer entry beyond the pushbuffer segment is +(GP_ENTRY1_GET_HI << 32) + (GP_ENTRY0_GET << 2) + (GP_ENTRY1_LENGTH * 4). + If NV_PPBDMA_GP_ENTRY1_LENGTH is CONTROL (0), then the GP entry is a +"control" entry, meaning this GP entry will not cause any PB data to be fetched +or executed. In this case, the NV_PPBDMA_GP_ENTRY1_OPCODE field specifies an +operation to perform, and the NV_PPBDMA_GP_ENTRY0_OPERAND field contains the +operand. The available operations are as follows: + + * NV_PPBDMA_GP_ENTRY1_OPCODE_NOP: no operation will be performed, but note + that the SYNC field is still respected--see below. + + * NV_PPBDMA_GP_ENTRY1_OPCODE_GP_CRC: the ENTRY0_OPERAND field is compared + with the cyclic redundancy check value that was calculated over previous + GP entries (NV_PPBDMA_GP_CRC). After each comparison, the + NV_PPBDMA_GP_CRC is cleared, whether they match or differ. If they + differ, then Host initiates an interrupt (NV_PPBDMA_INTR_0_GPCRC). For + recovery, clearing the interrupt will cause the PBDMA to continue as if + the control entry was OPCODE_NOP. + + * NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC: the ENTRY0_OPERAND is compared + with the CRC value that was calculated over the previous pushbuffer + segment (NV_PPBDMA_PB_CRC). The PB CRC resets to 0 with each pushbuffer + segment. If the two CRCs differ, Host will raise the + NV_PPBDMA_INTR_0_PBCRC interrupt. For recovery, clearing the interrupt + will continue as if the control entry was OPCODE_NOP. Note the PB_CRC is + indeterminate if an END_PB_SEGMENT PB control entry was used in the prior + segment or if SSDM disabled the device and the segment had conditional + fetching enabled. + + Host supports two privilege levels for channels: privileged and +non-privileged. The privilege level is determined by the +NV_PPBDMA_CONFIG_AUTH_LEVEL field set from the corresponding NV_RAMFC_CONFIG +dword in the RAMFC. Non-privileged channels cannot execute privileged methods, +but privileged channels can. Any attempt to run a privileged operation from a +non-privileged channel will result in PB raising NV_PPBDMA_INTR_0_METHOD. + + + The NV_PPBDMA_GP_ENTRY1_SYNC field specifies whether a pushbuffer may be +fetched before Host has finished processing the preceding PB segment. If this +field is SYNC_PROCEED, then Host does not wait for the preceding PB segment to +be processed. If this field is SYNC_WAIT, then Host waits until the preceding +PB segment has been processed by Host before beginning to fetch the current PB +segment. + Host's processing of a PB segment consists of parsing PB entries into PB +instructions, decoding those instructions into control entries or method +headers, generating methods from method headers, determining whether methods are +to be executed by Host or by an engine, executing Host methods, and sending +non-Host methods and SetObject methods to engines. + Note that in the case where the final PB entry of the preceding PB segment +is a method header representing a PB compressed method sequence of nonzero +length--that is, the compressed method sequence is split across PB segments with +all of its method data entries in the PB segment for which SYNC_WAIT is +set--then Host is considered to have finished processing the preceding PB +segment once that method header is read. However, splitting a PB compressed +method sequence for software methods is not supported because Host will issue +the DEVICE interrupt indicating the SW method as soon as it processess the +method header, which happens prior to fetching the method data entries for that +compressed method sequence. Thus SW cannot actually execute any of the methods +in the sequence because the method data is not yet available, leaving the PBDMA +wedged. + When SYNC_WAIT is set, Host does not wait for any engine methods generated +from the preceding PB segment to complete. Host does not automatically wait +until an engine is done processing all methods generated from that PB segment. +If software desires that the engine finish processing all methods generated from +one PB segment before a second PB segment is fetched, then software may place +Host methods that wait until the engine is idle in the first PB segment (like +WFI, SET_REF, or SEM_EXECUTE with RELEASE_WFI_EN set). Alternatively, software +might put a semaphore acquire at the end of the first PB segment, and have an +engine release the semaphore. In both cases, SYNC_WAIT must be set on the +second PB segment. This field applies even if the NV_PPBDMA_GP_ENTRY1_LENGTH +field is zero; if SYNC_WAIT is specified in this case, no further GP entries +will be processed until the wait finishes. + + Some parts of a pushbuffer may not be executed depending on the value of +the NV_PPBDMA_SUBDEVICE_ID and SUBDEVICE_MASK. If an entire PB segment will not +be executed due to conditional execution, Host need not even bother fetching the +PB segment. + The NV_PPBDMA_GP_ENTRY0_FETCH field indicates whether the PB segment +specified by the GP entry should be fetched unconditionally or fetched +conditionally. If this field is FETCH_UNCONDITIONAL, then the PB segment is +fetched unconditionally. If this field is FETCH_CONDITIONAL, then the PB +segment is only fetched if the NV_PPBDMA_SUBDEVICE_STATUS field is +STATUS_ACTIVE. + +******************************************************************************** +Warning: When using subdevice masking, one must take care to synchronize +properly with any later GP entries marked FETCH_CONDITIONAL. If GP fetching +gets too far ahead of PB processing, it is possible for a later conditional PB +segment to be discarded prior to reaching an SSDM command that sets +SUBDEVICE_STATUS to ACTIVE. This would cause Host to execute garbage data. One +way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL +segments following a subdevice reenable. +******************************************************************************** + + If the PB segment is not fetched then it behaves as an OPCODE_NOP control +entry. If a PB segment contains a SET_SUBDEVICE_MASK PB instruction that Host +must see, then the GP entry for that PB segment must specify +FETCH_UNCONDITIONAL. + If the PB segment specifies FETCH_CONDITIONAL and the subdevice mask shows +STATUS_ACTIVE, but the PB segment contains a SET_SUBDEVICE_MASK PB instruction +that will disable the mask, the rest of the PB segment will be discarded. In +that case, an arbitrary number of entries past the SSDM may have already updated +the PB CRC, rendering the PB CRC indeterminate. + If Host must wait for a previous PB segment's Host processing to be +completed before examining NV_PPBDMA_SUBDEVICE_STATUS, then the GP entry should +also have its SYNC_WAIT field set. + A PB segment marked FETCH_CONDITIONAL must not have a PB compressed method +sequence that crosses a PB segment boundary (with its header in previous non- +conditional PB segment and its final valid data in a conditional PB segment)-- +doing so will cause a NV_PPBDMA_INTR_0_PBSEG interrupt. + + Software may monitor Host's progress through the pushbuffer by reading the +channel's NV_RAMUSERD_TOP_LEVEL_GET entry from USERD, which is backed by Host's +NV_PPBDMA_TOP_LEVEL_GET register. See "NV_PFIFO_USERD_WRITEBACK" in +dev_fifo.ref for information about how frequently this information is written +back into USERD. If a PB segment occurs multiple times within a pushbuffer +(like a commonly used subroutine), then progress through that segment may be +less useful for monitoring, because software will not know which occurrence of +the segment is being processed. + The NV_PPBDMA_GP_ENTRY_LEVEL field specifies whether progress through the +GP entry's PB segment should be indicated in NV_RAMUSERD_TOP_LEVEL_GET. If this +field is LEVEL_MAIN, then progress through the PB segment will be reported -- +NV_RAMUSERD_TOP_LEVEL_GET will equal NV_RAMUSERD_GET. If this field is +LEVEL_SUBROUTINE, then progress through this PB segment is not reported -- Host +will not alter NV_RAMUSERD_TOP_LEVEL_GET. If this field is LEVEL_SUBROUTINE, +reads of NV_RAMUSERD_TOP_LEVEL_GET will return the last value of NV_RAMUSERD_GET +from a PB segment at LEVEL_MAIN. + + If the GP entry's opcode is OPCODE_ILLEGAL or an invalid opcode, Host will +initiate an interrupt (NV_PPBDMA_INTR_0_GPENTRY). If a GP entry specifies a PB +segment that crosses the end of the virtual address space (0xFFFFFFFFFF), then +Host will initiate an interrupt (NV_PPBDMA_INTR_0_GPENTRY). Invalid GP entries +are treated like traps: they will set the interrupt and freeze the PBDMA, but +the invalid GP entry is discarded. Once the interrupt is cleared, the PBDMA +unit will simply continue with the next GP entry. + Note a corner case exists where the PB segment described by a GP entry is +at the end of the virtual address space, or in other words, the last PB entry in +the described PB segment is the last dword in the virtual address space. This +type of GP entry is not valid and will generate a GPENTRY interrupt. The +PBDMA's PUT pointer describes the address of the first dword beyond the PB +segment, thus making the last dword in the virtual address space unusable for +storing a pbentry. + + + +#define NV_PPBDMA_GP_ENTRY__SIZE 8 /* */ + +#define NV_PPBDMA_GP_ENTRY0 0x10000000 /* RW-4R */ + +#define NV_PPBDMA_GP_ENTRY0_OPERAND 31:0 /* RWXUF */ +#define NV_PPBDMA_GP_ENTRY0_FETCH 0:0 /* */ +#define NV_PPBDMA_GP_ENTRY0_FETCH_UNCONDITIONAL 0x00000000 /* */ +#define NV_PPBDMA_GP_ENTRY0_FETCH_CONDITIONAL 0x00000001 /* */ +#define NV_PPBDMA_GP_ENTRY0_GET 31:2 /* */ + +#define NV_PPBDMA_GP_ENTRY1 0x10000004 /* RW-4R */ + +#define NV_PPBDMA_GP_ENTRY1_GET_HI 7:0 /* RWXUF */ + + +#define NV_PPBDMA_GP_ENTRY1_LEVEL 9:9 /* RWXUF */ +#define NV_PPBDMA_GP_ENTRY1_LEVEL_MAIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_LEVEL_SUBROUTINE 0x00000001 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_LENGTH 30:10 /* RWXUF */ +#define NV_PPBDMA_GP_ENTRY1_LENGTH_CONTROL 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_SYNC 31:31 /* RWXUF */ +#define NV_PPBDMA_GP_ENTRY1_SYNC_PROCEED 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_SYNC_WAIT 0x00000001 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_OPCODE 7:0 /* RWXUF */ +#define NV_PPBDMA_GP_ENTRY1_OPCODE_NOP 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_OPCODE_ILLEGAL 0x00000001 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_OPCODE_GP_CRC 0x00000002 /* RW--V */ +#define NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC 0x00000003 /* RW--V */ + + + + + +Number of NOPs for self-modifying gpfifo + +This is a formula for SW to estimate the number of NOPs needed to pad the gpfifo +such that the modification of a gp entry by the engine or by the CPU can take +effect. Here, NV_PFIFO_LB_GPBUF_CONTROL_SIZE(eng) refers to the SIZE field in the +NV_PFIFO_LB_GPBUF_CONTROL(eng) register.(More info about the register in dev_fifo.ref) + +NUM_GP_NOPS(eng) = ((NV_PFIFO_LB_GPBUF_CONTROL_SIZE(eng)+1) * NV_PFIFO_LB_ENTRY_SIZE)/ NV_PPBDMA_GP_ENTRY__SIZE + + + + + +GP_BASE - Base and Limit of the Circular Buffer of GP Entries + + GP entries are stored in a buffer in memory. The NV_PPBDMA_GP_BASE_OFFSET +and NV_PPBDMA_GP_BASE_HI_OFFSET fields specify the 37-bit address in 8-byte +granularity of the start of a circular buffer that contains GP entries (GPFIFO). +This address is a virtual (not a physical) address. GP entries are always +GP_ENTRY__SIZE-byte aligned, so the least significant three bits of the byte +address are not stored. The byte address of the GPFIFO base pointer is thus: + + gpfifo_base_ptr = GP_BASE + (GP_BASE_HI_OFFSET << 32) + + The number of GP entries in the circular buffer is always a power of 2. +The NV_PPBDMA_GP_BASE_HI_LIMIT2 field specifies the number of bits used to count +the memory allocated to the GP FIFO. The LIMIT2 value specified in these +registers is Log base 2 of the number of entries in the GP FIFO. For example, +if the number of entries is 2^16--indicating a memory area of +(2^16)*GP_ENTRY__SIZE bytes--then the value written in LIMIT2 is 16. + The circular buffer containing GP entries cannot cross the maximum address. +If OFFSET + (1<<LIMIT2)*GP_ENTRY__SIZE - 1 > 0xFFFFFFFFFF, then Host will +initiate a CPU interrupt (NV_PPBDMA_INTR_0_GPFIFO). + The NV_PPBDMA_GP_PUT, NV_PPBDMA_GP_GET, and NV_PPBDMA_GP_FETCH registers +(and their associated NV_RAMFC and NV_RAMUSERD entries) are relative to the +value of this register. + These registers are part of a GPU context's state. On a switch, the values +of these registers are saved to, and restored from, the NV_RAMFC_GP_BASE and +NV_RAMFC_GP_BASE_HI entries in the RAMFC part of the GPU context's GPU-instance +block. + Typically, software initializes the information in NV_RAMFC_GP_BASE and +NV_RAMFC_GP_BASE_HI when the GPU context's GPU-instance block is first created. +These registers are available to software only for debug. Software should use +them only if the GPU context is assigned to a PBDMA unit and that PBDMA unit is +stalled. While a GPU context's Host context is not contained within a PBDMA +unit, software should use the RAMFC entries to access this information. + A pair of these registers exists for each of Host's PBDMA units. These +registers run on Host's internal bus clock. + + +#define NV_PPBDMA_GP_BASE(i) (0x00040048+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_BASE__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_BASE_OFFSET 31:3 /* RW-UF */ +#define NV_PPBDMA_GP_BASE_OFFSET_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_BASE_RSVD 2:0 /* RW-UF */ +#define NV_PPBDMA_GP_BASE_RSVD_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_GP_BASE_HI(i) (0x0004004c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_BASE_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_BASE_HI_OFFSET 7:0 /* RW-UF */ +#define NV_PPBDMA_GP_BASE_HI_OFFSET_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_BASE_HI_LIMIT2 20:16 /* RW-UF */ +#define NV_PPBDMA_GP_BASE_HI_LIMIT2_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_BASE_HI_RSVDA 15:8 /* RW-UF */ +#define NV_PPBDMA_GP_BASE_HI_RSVDA_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_GP_BASE_HI_RSVDB 31:21 /* RW-UF */ +#define NV_PPBDMA_GP_BASE_HI_RSVDB_ZERO 0x00000000 /* RW--V */ + + +GP_FETCH - Pointer to the next GP-Entry to be Fetched + + Host does not fetch all GP entries with a single request to the memory +subsystem. Host fetches GP entries in batches. The NV_PPBDMA_GP_FETCH register +indicates index of the next GP entry to be fetched by Host. The actual 40-bit +virtual address of the specified GP entry is computed as follows: + fetch address = GP_FETCH_ENTRY * NV_PPBDMA_GP_ENTRY__SIZE + GP_BASE + If NV_PPBDMA_GP_PUT==NV_PPBDMA_GP_FETCH, then requests to fetch the entire +GP circular buffer have been issued, and Host cannot make more requests until +NV_PPBDMA_GP_PUT is changed. Host may finish fetching GP entries long before it +has finished processing the PB segments specified by those entries. +Software should not use NV_PPBDMA_GP_FETCH (it should use NV_PPBDMA_GP_GET), to +determine whether the GP circular buffer is full. NV_PPBDMA_GP_FETCH represents +the current extent of prefetching of GP entries; prefetched entries may be +discarded and refetched later. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_GP_FETCH entry of +the RAMFC part of the GPU context's GPU-instance block. + A PBDMA unit maintains this register. Typically, software does not need to +access this register. This register is available to software only for debug. +Because Host may fetch GP entries long before it is ready to process the +entries, and because Host may discard GP entries that it has fetched, software +should not use NV_PPBDMA_GP_FETCH to monitor Host's progress (software should +use NV_PPBDMA_GP_GET for monitoring). Software should use this register only if +the GPU context is assigned to a PBDMA unit and that PBDMA unit is stalled. +While a GPU context's Host context is not contained within a PBDMA unit, +software should use NV_RAMFC_GP_FETCH to access this information. + If after a PRI write, or after this register has been restored from RAMFC +memory, the value equals or exceeds the size of the circular buffer that stores +GP entries (1<<NV_PPBDMA_GP_BASE_HI_LIMIT2), Host will initiate an interrupt +(NV_PPBDMA_INTR_*_GPPTR), and stall. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal bus clock. This register was introduced in +Fermi. + + +#define NV_PPBDMA_GP_FETCH(i) (0x00040050+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_FETCH__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_FETCH_ENTRY 31:0 /* RW-UF */ +#define NV_PPBDMA_GP_FETCH_ENTRY_ZERO 0x00000000 /* RW--V */ + + + +GP_GET - Pointer to the next GP-Entry to be Processed + + After a GP entry is fetched, it needs to be processed. Typically, a GP +entry is processed by fetching the segment of pushbuffer data specified by that +GP entry, parsing the pushbuffer data into PB instructions, decoding +instructions into PB control entries or method headers, and generating methods +from method headers and their corresponding method data entries. + The NV_PPBDMA_GP_GET register contains the index of the GP entry for the +next PB segment to begin being processed. Once the next GP entry has +begun processing, that GP entry is committed and will not be refetched, and +NV_PPBDMA_GP_GET is incremented to indicate that the memory location is no +longer referenced. + NV_PPBDMA_GP_GET is not an address, but rather an index into the GP FIFO, +offset from the beginning of the GP circular buffer in memory (defined by +NV_PPBDMA_GP_BASE). The actual 40-bit address is computed as follows: + GP_GET address = GP_GET_ENTRY * NV_PPBDMA_GP_ENTRY__SIZE + GP_BASE + If it is desired that user-level software be prevented from writing GP +entries , +GP entries may be +stored in privileged pages of memory. Since NV_PPBDMA_GP_GET is an index, not +an address, user-level software (which may be able to alter NV_PPBDMA_GP_GET) +cannot move GP_GET outside of the memory area defined by NV_PPBDMA_GP_BASE. + While the circular buffer containing GP entries is full, the CPU cannot +write any more GP entries. There is no extra state bit to distinguish between a +full GP buffer and an empty GP buffer. If NV_PPBDMA_GP_PUT equals +NV_PPBDMA_GP_GET-1, then the buffer is full. If NV_PPBDMA_GP_PUT equals +NV_PPBDMA_GP_GET, then the GP circular buffer is empty, and there are no more GP +entries for Host to process. Because of these definitions of full and empty, +the GP circular buffer must always have at least one entry that is empty. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_GP_GET entry of +the RAMFC part of the GPU context's GPU-instance block. Host stores GP entries +that have been fetched but have not been processed in Host's Latency Buffer. + Typically, software initializes this information using NV_RAMFC_GP_GET +when the GPU context is first created. Hardware maintains the value of this +register. Software usually accesses this information using NV_RAMUSERD_GP_GET. +This register is available to software only for debug--software should use the +register directly only if the GPU context is assigned to a PBDMA unit and that +PBDMA unit is stalled. While a GPU context is not assigned to a PBDMA unit and +not bound to a channel, software should use NV_RAMFC_GP_GET to access this +information. + If after a PRI write, or after this register has been restored from RAMFC +memory, the value equals or exceeds the size of the circular buffer that stores +GP entries (1<<NV_PPBDMA_GP_BASE_HI_LIMIT2), Host will initiate an interrupt +(NV_PPBDMA_INTR_*_GPPTR), and stall. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_GP_GET(i) (0x00040014+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_GET__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_GET_ENTRY 31:0 /* RW-UF */ +#define NV_PPBDMA_GP_GET_ENTRY_ZERO 0x00000000 /* RW--V */ + + +GP_PUT - Pointer to the next GP-Entry to be Written + + Typically, the CPU writes GP entries to a circular buffer, and Host reads +them from that buffer. Host should not read entries before they have been +written. + The NV_PPBDMA_GP_PUT register contains the index of the next GP entry +that the CPU will write to memory. NV_PPBDMA_GP_PUT points past the last entry +that has been written. NV_PPBDMA_GP_PUT is an offset from the beginning of the +GP circular buffer in memory (NV_PPBDMA_GP_BASE). The actual 40-bit address is +computed as follows: + GP_PUT address = GP_PUT_ENTRY * NV_PPBDMA_GP_ENTRY__SIZE + GP_BASE + If NV_PPBDMA_GP_PUT==NV_PPBDMA_GP_GET-1, then the buffer is full. While +the buffer is full, the CPU can write no more GP entries. If NV_PPBDMA_GP_PUT +equals NV_PPBDMA_GP_GET, then the buffer is empty. While the buffer is empty, +Host can process no more GP entries. Because of these definitions of full and +empty, the GP circular buffer must always have at least one empty entry. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from the NV_RAMFC_GP_PUT entry of +the RAMFC part of the GPU context's GPU-instance block. + Typically, software alters GP_PUT by writing to NV_RAMUSERD_GP_PUT. This +register is not immediately synchronized with NV_RAMUSERD_GP_PUT--there will be a +delay in that synchronization until internal reads of the pushbuffer are +guaranteed to be ordered behind the write (soft-flush). This +register is available to software only for debug. Software should use this +register only if the GPU context is assigned to a PBDMA unit and that PBDMA unit +is stalled. While a GPU context is not assigned to a PBDMA unit and is not +bound to a channel, software should use NV_RAMFC_GP_PUT to access this +information. + If after a PRI write, or after this register has been restored from RAMFC +memory, the value equals or exceeds the size of the circular buffer that stores +GP entries (1<<NV_PPBDMA_GP_BASE_HI_LIMIT2), Host will initiate an interrupt +(NV_PPBDMA_INTR_*_GPPTR), and stall. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_GP_PUT(i) (0x00040000+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_PUT__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_PUT_ENTRY 31:0 /* RW-UF */ +#define NV_PPBDMA_GP_PUT_ENTRY_ZERO 0x00000000 /* RW--V */ + + + +PB_FETCH - Pointer to the next PB Data to be Fetched + + As directed by GP entries, Host fetches pushbuffer data for a channel, +processes the data, and sends methods generated from the data to engines. Each +GP entry specifies a range of addresses from which Host is to fetch pushbuffer +data. + Typically, PB segments are too large for Host to fetch the entire +segment at one time. The NV_PPBDMA_PB_FETCH_ADDR and NV_PPBDMA_FETCH_HI_ADDR +registers contain the next address from which Host will fetch pushbuffer data. +PB compressed method sequences have variable sizes. Until PB data is parsed, +Host does not know where one PB compressed method sequence ends and another +begins. PB_FETCH may point to the middle of a compressed method sequence. +Before Host begins fetching PB data for a new GP entry, it sets this field to +the value from the GP entry's NV_PPBDMA_GP_ENTRY0_GET and +NV_PPBDMA_GP_ENTRY1_GET_HI entries so that Host will start fetching from the new +PB segment. + The NV_PPBDMA_PB_FETCH_HI_LENGTH field contains the number of PB entries in +the PB segment for which no fetch request has been issued. Before Host begins +fetching PB data for a new GP entry, it sets this field to the value from the GP +entry's NV_PPBDMA_GP_ENTRY1_LENGTH field. + The NV_PPBDMA_PB_FETCH_CONDITIONAL field indicates whether the PB +segment specified by the GP entry should be fetched unconditionally, or should +be fetched only if the NV_PPBDMA_SUBDEVICE_STATUS field is STATUS_ACTIVE. +Before Host begins fetching PB data for a new GP entry, it sets this field to +the value from the GP entry's NV_PPBDMA_GP_ENTRY0_FETCH field. + The NV_PPBDMA_PB_FETCH_HI_SYNC field specifies whether a pushbuffer may be +fetched before Host has finished processing the preceding PB segment. +Before Host begins fetching PB data for a new GP entry, it sets this field to +the value from the GP entry's NV_PPBDMA_GP_ENTRY1_SYNC field. + The NV_PPBDMA_PB_FETCH_HI_LEVEL field specifies whether progress through +the GP entry's PB segment should be indicated in +NV_RAMUSERD_TOP_LEVEL_GET. If LEVEL is SUBROUTINE, progress is not reflected in +TOP_LEVEL_GET. Before Host begins fetching PB data for a new GP entry, it sets +this field to the value from the GP entry's NV_PPBDMA_GP_ENTRY1_LEVEL field. + These registers are part of a GPU context's state. On a switch, the +register values are saved to and restored from the NV_RAMFC_PB_FETCH and +NV_RAMFC_PB_FETCH_HI entries of the RAMFC part of the GPU context's +GPU-instance block. + Hardware maintains these registers. Typically, software does not access +them directly; they are available to software only for debug. Because Host +may fetch pushbuffer data long before it is ready to process the data, and +because Host may discard pushbuffer data that it has fetched, software should +not use PB_FETCH to monitor Host's progress. Software should use +these registers only if the GPU context is assigned to a PBDMA unit and that +PBDMA unit is stalled. While a GPU context's Host context is not contained +within a PBDMA unit, software should use NV_RAMFC_PB_FETCH and +NV_RAMFC_PB_FETCH_HI to access this information. + A pair of these registers exists for each of Host's PBDMA units. These +registers run on Host's internal domain clock. + + +#define NV_PPBDMA_PB_FETCH(i) (0x00040054+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PB_FETCH__SIZE_1 14 /* */ + +#define NV_PPBDMA_PB_FETCH_CONDITIONAL 0:0 /* RW-UF */ +#define NV_PPBDMA_PB_FETCH_CONDITIONAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_FETCH_CONDITIONAL_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_PB_FETCH_ADDR 31:2 /* RW-UF */ +#define NV_PPBDMA_PB_FETCH_ADDR_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_PB_FETCH_HI(i) (0x00040058+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PB_FETCH_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_PB_FETCH_HI_ADDR 7:0 /* RW-UF */ +#define NV_PPBDMA_PB_FETCH_HI_ADDR_ZERO 0x00000000 /* RW--V */ + + +#define NV_PPBDMA_PB_FETCH_HI_LEVEL 9:9 /* RW-UF */ +#define NV_PPBDMA_PB_FETCH_HI_LEVEL_MAIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_FETCH_HI_LEVEL_SUBROUTINE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_PB_FETCH_HI_SYNC 10:10 /* RW-UF */ +#define NV_PPBDMA_PB_FETCH_HI_SYNC_PROCEED 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_FETCH_HI_SYNC_WAIT 0x00000001 /* RW--V */ + +#define NV_PPBDMA_PB_FETCH_HI_LENGTH 31:11 /* RW-UF */ +#define NV_PPBDMA_PB_FETCH_HI_LENGTH_ZERO 0x00000000 /* RW--V */ + + +GET - Pointer to the next PB Data to be Processed + + The NV_PPBDMA_GET and NV_PPBDMA_GET_HI registers contain the virtual +address of the next pushbuffer data to be processed, called the "GET" pointer. +GET may point to the middle of a PB compressed method sequence. + Pushbuffer data that has been fetched but has not been processed is stored +in Host's Latency Buffer. When a channel's context is restored from memory to +Host, if that channel's Latency Buffer data has been preserved, then Host will +continue fetching pushbuffer data from PB_FETCH (which is stored in the +NV_PPBDMA_PB_FETCH and NV_PPBDMA_PB_FETCH_HI registers described above). If +that Latency Buffer data has been lost, then Host will continue fetching +pushbuffer data from the GET address. Typically, Latency Buffer data is +preserved if there are more engines than Host has PBDMA units for serving +engines. + These registers are part of a GPU context's state. On a switch, the +register values are saved to, and restored from, the NV_RAMFC_PB_GET and +NV_RAMFC_PB_GET_HI entries of the RAMFC part of the GPU context's GPU-instance +block. + Hardware maintains the values of these registers. Typically, software +accesses this information using NV_RAMUSERD_GET and NV_RAMUSERD_GET_HI. These +registers are available to software only for debug. Software should use them +only if the GPU context is assigned to a PBDMA unit. While a GPU context is not +assigned to a PBDMA unit and is not bound to a channel, software should use +NV_RAMFC_PB_GET and NV_RAMFC_PB_GET_HI to access this information instead. + If after a PRI write, or after this register has been restored from RAMFC +memory, the value exceeds the value of NV_PPBDMA_PUT, Host will initiate an +interrupt (NV_PPBDMA_INTR_0_PBPTR), and stall. + A pair of these registers exists for each of Host's PBDMA units. These +registers run on Host's internal domain clock. + + +#define NV_PPBDMA_GET(i) (0x00040018+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GET__SIZE_1 14 /* */ + +#define NV_PPBDMA_GET_OFFSET 31:2 /* RW-UF */ +#define NV_PPBDMA_GET_OFFSET_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_GET_HI(i) (0x0004001c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GET_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_GET_HI_OFFSET 7:0 /* RW-UF */ +#define NV_PPBDMA_GET_HI_OFFSET_ZERO 0x00000000 /* RW--V */ + + +PUT - Pointer to the End of the PB Segment + + Each GP entry specifies a range of addresses from which Host is to fetch +pushbuffer data. This range of addresses defines a PB segment. The +NV_PPBDMA_PUT and NV_PPBDMA_PUT_HI registers contain the PUT field, which +specifies the address of the first memory location after the end of the +PB segment currently being processed. Host will stop fetching the +PB segment when it reaches this address. + This register is part of a GPU context's state. On a switch, the values of +theses registers are saved to and restored from the NV_RAMFC_PB_PUT and +NV_RAMFC_PB_PUT_HI entries of the RAMFC part of the GPU context's GPU-instance +block. + Hardware maintains these registers. Typically, software may access this +information through NV_RAMUSERD_PUT and NV_RAMUSERD_PUT_HI. Software should +generally not access these registers directly; they are available to software +only for debug. Software should use them only if the GPU context is assigned +to a PBDMA unit. While a GPU context is not assigned to a PBDMA unit and is not +bound to a channel, software should use NV_RAMFC_PB_PUT and NV_RAMFC_PB_PUT_HI +to access this information. + A pair of these registers exists for each of Host's PBDMA units. These +registers run on Host's internal domain clock. + + +#define NV_PPBDMA_PUT(i) (0x0004005c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PUT__SIZE_1 14 /* */ + +#define NV_PPBDMA_PUT_OFFSET 31:2 /* RW-UF */ +#define NV_PPBDMA_PUT_OFFSET_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_PUT_RSVD 1:0 /* R-IUF */ +#define NV_PPBDMA_PUT_RSVD_ZERO 0x00000000 /* R-I-V */ + +#define NV_PPBDMA_PUT_HI(i) (0x00040060+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PUT_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_PUT_HI_OFFSET 7:0 /* RW-UF */ +#define NV_PPBDMA_PUT_HI_OFFSET_ZERO 0x00000000 /* RW--V */ + + +TOP_LEVEL_GET - Pointer to next top-level (non-subroutine) PB Data to be Processed + + Software may use Host's GET pointers to monitor Host's progress fetching +and processing the pushbuffer. However, pushbuffers may contain segments that +are used at many different places within the pushbuffer (for example, a commonly +called subroutine). If a segment is used in many different places, it may be +less helpful to know that Host is in the middle of such a lower-level segment. +Host contains a mechanism (NV_PPBDMA_GP_ENTRY1_LEVEL_SUBROUTINE) to allow +software to specify that some segments be ignored for GET pointer monitoring. +TOP_LEVEL_GET reflects GET for the last address in a segment that is not ignored +for monitoring. + The NV_PPBDMA_TOP_LEVEL_GET and NV_PPBDMA_TOP_LEVEL_GET_HI registers hold +the last value obtained from a GP_ENTRY for NV_PPBDMA_GET and NV_PPBDMA_GET_HI +respectively that had the NV_PPBDMA_GP_ENTRY1_LEVEL set to LEVEL_MAIN. If Host +has not yet encountered a GP entry with LEVEL_MAIN, then the +TOP_LEVEL_GET_HI_VALID field is FALSE. VALID becomes TRUE only after the first +method has been fetched from the LEVEL_MAIN segment, and becomes FALSE again +when the channel is switched out. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_PB_TOP_LEVEL_GET and +NV_RAMFC_PB_TOP_LEVEL_GET_HI entries of the RAMFC part of the GPU context's +GPU-instance block. + Hardware maintains this register. Typically, software accessses this +information by reading NV_RAMUSERD_TOP_LEVEL_GET first and then +NV_RAMUSERD_TOP_LEVEL_GET_HI. The TOP_LEVEL_GET registers are available to +software only for debug. Software should only directly use these registers if +the GPU context is assigned to a PBDMA unit. While a GPU context is not +assigned to a PBDMA unit and not bound to a channel, software should use +NV_RAMFC_PB_TOP_LEVEL_GET and NV_RAMFC_PB_TOP_LEVEL_GET_HI to access this +information. + A pair of these registers exists for each of Host's PBDMA units. These +registers run on Host's internal domain clock. + + + +#define NV_PPBDMA_TOP_LEVEL_GET(i) (0x00040020+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_TOP_LEVEL_GET__SIZE_1 14 /* */ + +#define NV_PPBDMA_TOP_LEVEL_GET_OFFSET 31:2 /* RW-UF */ +#define NV_PPBDMA_TOP_LEVEL_GET_OFFSET_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_TOP_LEVEL_GET_RSVD 1:0 /* R-IUF */ +#define NV_PPBDMA_TOP_LEVEL_GET_RSVD_ZERO 0x00000000 /* R-I-V */ + +#define NV_PPBDMA_TOP_LEVEL_GET_HI(i) (0x00040024+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_TOP_LEVEL_GET_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_TOP_LEVEL_GET_HI_OFFSET 7:0 /* RW-UF */ +#define NV_PPBDMA_TOP_LEVEL_GET_HI_OFFSET_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_TOP_LEVEL_GET_HI_VALID 31:31 /* RWIUF */ +#define NV_PPBDMA_TOP_LEVEL_GET_HI_VALID_FALSE 0x00000000 /* RWI-V */ +#define NV_PPBDMA_TOP_LEVEL_GET_HI_VALID_TRUE 0x00000001 /* RW--V */ + + +GP_CRC - CRC Value over GP Entries + + The NV_PPBDMA_GP_CRC register contains a cyclic redundancy check value that +was calculated from GP entries. It may be used for debug to determine whether +GP entries have been properly fetched and whether the data returned is expected. + The IEEE 802.3 CRC-32 polynomial is used to calculate CRC values. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_GP_CRC entry of +the RAMFC part of the GPU context's GPU-instance block. + Hardware maintains the value of this register. Software may use special GP +entries (NV_PPBDMA_GP_ENTRY1_OPCODE_GP_CRC) to check and clear this CRC value. +This register is available to software only for debug. Software should use this +register only if the GPU context is assigned to a PBDMA unit and that PBDMA unit +is stalled. While a GPU context's Host context is not contained within a PBDMA +unit, software should use NV_RAMFC_GP_CRC to access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. This register was introduced in +Fermi. + + +#define NV_PPBDMA_GP_CRC(i) (0x00040074+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_CRC__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_CRC_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_GP_CRC_VALUE_ZERO 0x00000000 /* RW--V */ + + + +PB_HEADER - The PB Instruction Currently Being Processed + + The NV_PPBDMA_PB_HEADER register contains information about the PB +instruction (either a PB method header or a PB control entry) currently being +processed. It also contains information about the PB segment from which the PB +instruction was fetched. Not all of the PB instruction's information is stored +in this register. + + Note the information stored in PB_HEADER register is a dynamic +representation of the instruction being processed. It does not contain an exact +copy of the original PB entry in which the instruction was found. For instance +if the instruction is a PB incrementing method header, the VALUE field of the +NV_PPBDMA_PB_COUNT register stores the number of method data entries left to be +consumed, and thus is decremented for each method generated. + + The NV_PPBDMA_PB_HEADER_TYPE field indicates the specific type of method +header or control entry currently being processed. The TYPE may be an +incrementing method header (TYPE_INC), a non-incrementing method header +(TYPE_NON_INC), an increment-once method header (TYPE_INC_ONCE), an +immediate-data method header (TYPE_IMMD), a SET_SUBDEVICE_MASK control entry +(TYPE_SSDM), a STORE_SUBDEVICE_MASK control entry (TYPE_STORE_SDM), a +USE_SUBDEVICE_MASK control entry (TYPE_USE_SDM), or an end-of-pushbuffer-segment +control entry (TYPE_END_SEG). See "FIFO_DMA" in dev_ram.ref for details about +these types of PB instructions. Note when PB_HEADER_TYPE is TYPE_INC_ONCE, this +field will be updated to TYPE_NON_INC after the first method in the compressed +sequence has been generated. + The NV_PPBDMA_PB_HEADER_METHOD field contains the current method address. +While processing an incrementing method header and its method data entries, this +field will increment after each method is generated. + The NV_PPBDMA_PB_HEADER_SUBCHANNEL field identifies the subchannel to which +methods generated from the current instruction are targeting (if applicable). +Note that the mapping from subchannels to engines is fixed for each runlist +type. + The NV_PPBDMA_PB_HEADER_LEVEL field indicates whether the current PB +instruction is within a PB segment that is being used for progress monitoring. +If this field is LEVEL_MAIN, then progress through the current PB segment is +available in NV_PPBDMA_TOP_LEVEL_GET. If this field is LEVEL_SUBROUTINE, the +progress through the current PB segment does not affect TOP_LEVEL_GET. The +value of this field comes from the GP entry that specified the PB segment. + The NV_PPBDMA_PB_HEADER_FINAL field indicates that the PB entry in which +the current PB instruction was found is the final PB entry of a PB segment. +This field is used by hardware for tracking PB segment boundaries. + The NV_PPBDMA_PB_HEADER_FIRST field indicates whether this PB instruction +is the first PB instruction of a new PB segment. This field is used by hardware +for tracking. + The NV_PPBDMA_PB_HEADER_CONDITIONAL field indicates whether this PB +instruction is from a conditionally fetched PB segment. If this PB instruction +changes the subdevice mask to not match, then the remainder of this PB segment +is not processed. + + This register is part of a channel's state. On a switch, the value of this +register is saved to and restored from the NV_RAMFC_PB_HEADER field of the RAMFC +part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_PB_HEADER instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_PB_HEADER(i) (0x00040084+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PB_HEADER__SIZE_1 14 /* */ + +#define NV_PPBDMA_PB_HEADER_METHOD_OR_SDMASK 15:2 /* RW-UF */ +#define NV_PPBDMA_PB_HEADER_METHOD 13:2 /* */ +#define NV_PPBDMA_PB_HEADER_METHOD_ZERO 0x00000000 /* */ +#define NV_PPBDMA_PB_HEADER_SDMASK 15:4 /* */ +#define NV_PPBDMA_PB_HEADER_SUBCHANNEL 18:16 /* RW-UF */ +#define NV_PPBDMA_PB_HEADER_SUBCHANNEL_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_LEVEL 20:20 /* RW-VF */ +#define NV_PPBDMA_PB_HEADER_LEVEL_MAIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_LEVEL_SUBROUTINE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_FIRST 22:22 /* RW-VF */ +#define NV_PPBDMA_PB_HEADER_FIRST_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_FIRST_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_CONDITIONAL 23:23 /* RW-VF */ +#define NV_PPBDMA_PB_HEADER_CONDITIONAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_CONDITIONAL_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_FINAL 24:24 /* RW-VF */ +#define NV_PPBDMA_PB_HEADER_FINAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_FINAL_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE 31:29 /* RW-UF */ +#define NV_PPBDMA_PB_HEADER_TYPE_SSDM 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_INC 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_STORE_SDM 0x00000002 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_NON_INC 0x00000003 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_IMMD 0x00000004 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_INC_ONCE 0x00000005 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_USE_SDM 0x00000006 /* RW--V */ +#define NV_PPBDMA_PB_HEADER_TYPE_END_SEG 0x00000007 /* RW--V */ + + + +PB_COUNT - PB Entry Processor Remaining Count + + Multiple method address/data pairs may be generated from a single PB method +header. The number of methods generated from a PB method header is indicated by +the header's count field. A single PB entry may require many cycles to process. +A channel may be switched out while Host is in the middle of processing a PB +compressed method sequence. The NV_PPBDMA_PB_COUNT register along with +NV_PPBDMA_PB_HEADER contains information about the PB method header currently +being processed. + The VALUE field of the NV_PPBDMA_PB_COUNT register contains the number of +method data entries remaining to be processed in the current compressed method +sequence. When PB_COUNT_VALUE is 0, there are no more remaining method data +entries to process, and the next PB entry in the pushbuffer data stream is +interpreted as the next PB instruction. When PB_COUNT_VALUE is nonzero, the +next PB entry in the PB data stream is interpreted as method data for use in +generating the next method address/data pair. After each method data entry is +processed, PB_COUNT_VALUE is decremented. + A PBDMA unit may contain up to three PB entries that have not yet begun +being parsed into PB instructions or method data. This raw pushbuffer data is +stored in NV_PPBDMA_PB_DATA*. The NV_PPBDMA_PB_COUNT_DATAVAL* fields indicate +whether or not the NV_PPBDMA_PB_DATA* registers contain valid PB entries. Each +PB entry can be from separate PB segments, and therefore may have different +GP-entry attributes. The attributes for each PB entry are stored in the +remaining fields (LEVEL*, CONDITIONAL*, and FINAL*) in this register; see +the above documentation for the associated NV_PPBDMA_PB_HEADER fields. + If the PB instruction being processed by Host's PB instruction processor is +an immediate-data method header, then instead of a count value, PB_COUNT_VALUE +contains a value to be used as the data part of a method address/data pair. + See "FIFO_DMA" in dev_ram.ref for details about compressed method +sequences and method headers. + + When the RAMFC in the instance block of a new channel is initialized, the +PB_COUNT_VALUE field should be cleared to allow the first PB entry to be decoded +as a PB instruction rather than as method data. + This register is part of a channel's state. On a switch, the value of this +register is saved to and restored from the NV_RAMFC_PB_COUNT field of the RAMFC +part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_COUNT instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_PB_COUNT(i) (0x00040088+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PB_COUNT__SIZE_1 14 /* */ + +#define NV_PPBDMA_PB_COUNT_VALUE 12:0 /* RW-UF */ +#define NV_PPBDMA_PB_COUNT_VALUE_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_PB_COUNT_DATAVAL0 16:16 /* RW-UF */ +#define NV_PPBDMA_PB_COUNT_DATAVAL0_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_DATAVAL0_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_LEVEL0 18:18 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_LEVEL0_MAIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_LEVEL0_SUBROUTINE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL0 14:14 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL0_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL0_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_FINAL0 15:15 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_FINAL0_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_FINAL0_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_PB_COUNT_DATAVAL1 20:20 /* RW-UF */ +#define NV_PPBDMA_PB_COUNT_DATAVAL1_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_DATAVAL1_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_LEVEL1 22:22 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_LEVEL1_MAIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_LEVEL1_SUBROUTINE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL1 28:28 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL1_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL1_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_FINAL1 29:29 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_FINAL1_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_FINAL1_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_PB_COUNT_DATAVAL2 24:24 /* RW-UF */ +#define NV_PPBDMA_PB_COUNT_DATAVAL2_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_DATAVAL2_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_LEVEL2 26:26 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_LEVEL2_MAIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_LEVEL2_SUBROUTINE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL2 30:30 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL2_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_CONDITIONAL2_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_FINAL2 31:31 /* RW-VF */ +#define NV_PPBDMA_PB_COUNT_FINAL2_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_PB_COUNT_FINAL2_TRUE 0x00000001 /* RW--V */ + +PB_CRC - CRC Value over PB Entries + + The NV_PPBDMA_PB_CRC register contains a cyclic redundancy check value +calculated from PB entries. It may be used for debug to determine whether PB +entries have been properly fetched and whether the data returned is expected. +The NV_PPBDMA_PB_CRC register is cleared at the beginning of each new PB +segment. Note the CRC is indeterminate if an END_PB_SEGMENT instruction was +used in the prior segment (or if the subdevice is disabled via SSDM and the +segment was marked for conditional fetching) because Host may have already +calculated the CRC for an arbitrary number of PB entries before processing the +END_PB_SEGMENT or SSDM control entry. + The IEEE 802.3 CRC-32 polynomial is used to calculate CRC values. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to and restored from the NV_RAMFC_PB_CRC entry of the +RAMFC part of the GPU context's GPU-instance block. + This register is maintained by hardware. Software may use special GP +entries (NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC) to check (and clear) the CRC value +for the previous PB segment. Typically, software does not access this +register--it is available to software only for debug. Software should use it +only if the GPU context is assigned to a PBDMA unit and that PBDMA unit is +stalled. While a GPU context's Host state is not contained within a PBDMA unit, +software should use NV_RAMFC_PB_CRC to access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. This register was introduced in +Fermi. + + + +#define NV_PPBDMA_PB_CRC(i) (0x00040098+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_PB_CRC__SIZE_1 14 /* */ + +#define NV_PPBDMA_PB_CRC_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_PB_CRC_VALUE_ZERO 0x00000000 /* RW--V */ + + +SUBDEVICE - Subdevice Identifier and Status Register + + The NV_PPBDMA_SUBDEVICE register is used to differentiate between GPU +contexts using the same pushbuffer. For example, two different GPU's in a SLI +configuration might use the same pushbuffer, or two different GPU contexts doing +stereo rendering might use the same pushbuffer. Using this register and +SET_SUBDEVICE_MASK PB instructions, software can specify that a set of methods +be sent to the engine only for a subset of the channels sharing the pushbuffer. + The SET_SUBDEVICE_MASK instruction (see dev_ram.ref) compares its mask +operand with the value in this register, if SUBDEVICE_CHANNEL_DMA is set to +ENABLED. If the logical-AND of the current SUBDEVICE_ID and the mask is +non-zero, and if SUBDEVICE_CHANNEL_DMA is ENABLED, SUBDEVICE_STATUS is set to +ACTIVE, and Host will send methods to the engine. If the current SUBDEVICE_ID +is not in the mask, SUBDEVICE_STATUS will be set to INACTIVE, and Host will not +send any methods to the engine. + The NV_PPBDMA_SUBDEVICE_STATUS field indicates whether methods being are +filtered. If this field is INACTIVE, later methods are not being generated, +decoded, executed by Host, or sent to an engine. If this field is ACTIVE, +methods are being processed normally. + The NV_PPBDMA_SUBDEVICE_CHANNEL_DMA field controls whether filtering +methods according to the SUBDEVICE_ID is enabled. If this field is DISABLE, +then SUBDEVICE_STATUS will always be set to ACTIVE, and all methods will be sent +to the engine. If a SET_SUBDEVICE_MASK or USE_SUBDEVICE_MASK instruction is sent +while this field is DISABLE, Host will generate an interrupt +(NV_PPBDMA_INTR_0_PBENTRY). + The NV_PPBDMA_SUBDEVICE_STORED_MASK field contains a subdevice mask value +to be used later by a USE_SUBDEVICE_MASK instruction. This field is loaded by a +USE_SUBDEVICE_MASK instruction. See dev_ram.ref for details. + This register is part of a GPU context's state. Each channel has its own +NV_PPBDMA_SUBDEVICE register value. On a switch, the NV_PPBDMA_SUBDEVICE value +is saved to and restored from the NV_RAMFC_SUBDEVICE entry of the RAMFC part of +the GPU context's GPU-instance block. + Typically, software initializes this information in NV_RAMFC_SUBDEVICE when +the GPU context is first created. This register is available to software only +for debug. Software should use this register only if the GPU context is +assigned to a PBDMA unit, and if that PBDMA unit is stalled. While a GPU +context's Host state is not contained within a PBDMA unit, software should +NV_RAMFC_SUBDEVICE to access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. It was introduced with the NV36 +Channel DMA class. + + +#define NV_PPBDMA_SUBDEVICE(i) (0x00040094+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SUBDEVICE__SIZE_1 14 /* */ + +#define NV_PPBDMA_SUBDEVICE_ID 11:0 /* RW-UF */ +#define NV_PPBDMA_SUBDEVICE_ID_ENABLE 0x00000FFF /* RW--V */ +#define NV_PPBDMA_SUBDEVICE_STORED_MASK 27:16 /* RW-UF */ +#define NV_PPBDMA_SUBDEVICE_STORED_MASK_ENABLE 0x00000FFF /* RW--V */ +#define NV_PPBDMA_SUBDEVICE_STATUS 28:28 /* RW-UF */ +#define NV_PPBDMA_SUBDEVICE_STATUS_INACTIVE 0x00000000 /* RW--V */ +#define NV_PPBDMA_SUBDEVICE_STATUS_ACTIVE 0x00000001 /* RW--V */ +#define NV_PPBDMA_SUBDEVICE_CHANNEL_DMA 29:29 /* RW-UF */ +#define NV_PPBDMA_SUBDEVICE_CHANNEL_DMA_DISABLE 0x00000000 /* RW--V */ +#define NV_PPBDMA_SUBDEVICE_CHANNEL_DMA_ENABLE 0x00000001 /* RW--V */ + + +METHODn - Method FIFO Address Registers + + The NV_PPBDMA_METHOD registers contain the method header information for +the PBDMA unit's tiny Method FIFO (called "Cache1" in the Tesla architecture). +The format of these registers does not match the method headers as present in +the pushbuffer, but they contain the necessary information for Host to process +each method. Method addresses generated from PB method headers and their +associated method data entries are stored in the Method FIFO until Host is ready +to process them. Method addresses indicate an operation to be performed by Host +or by an engine. The corresponding data for these methods are stored in +NV_PPBDMA_DATA registers. Compressed method sequences (method headers and their +associated method data entries) are expanded into these registers such that each +method data entry corresponds to a method address/data pair in the method FIFO. + The size of the method FIFO is given by the METHOD_FIFO_SIZE define; this +size is hard-coded and will remain constant for any given architecture. + The NV_PPBDMA_METHOD0 register contains the first method to be executed. +METHOD1 contains the second method to be executed, and so forth. + + The NV_PPBDMA_METHOD_SUBCH field contains the subchannel to which the +method is targeted. Subchannels are associated with engines according to a +fixed mapping and with class identifiers via the NV_UDMA_OBJECT method. + + + The NV_PPBDMA_METHOD_FIRST field indicates whether the header for this +method is the first method header of a PB segment (as specified by a GP entry). + The NV_PPBDMA_METHOD_VALID field indicates whether this queue entry is +valid. If this field is VALID_FALSE, then the entry is empty. + For some engines, Host may send two method address/data pairs in a cycle if +the addresses of the two methods are the same or if the address of the second +method is the address of the first method incremented. The +NV_PPBDMA_METHOD_DUAL field indicates that a method may be paired with the +following entry. If the engine that a method targets cannot support dual +methods, or if the method address indicates a Host-executed method, then methods +may be sent one at a time even if the first method is marked DUAL. When +generating methods from PB method headers and their associated method data +entries, Host sets this field deterministically (independently of the rate at +which the PBDMA unit receives PB data from memory). If the +NV_PPBDMA_METHOD_DUAL field is DUAL_TRUE, then the NV_PPBDMA_METHOD_INCR field +indicates whether the method address of the second is equal to the address of +the first incremented. In the case of an incrementing method, DUAL_TRUE and +INCR_TRUE will only be set if the method address is even. + + This register is part of a GPU context's state. On a switch, the values of +these registers are saved to, and restored from, the NV_RAMFC_METHOD* fields +of the RAMFC part of the GPU context's GPU-instance block. + Hardware maintains this information. Software should use this register +only if the GPU context is assigned to a PBDMA unit and that PBDMA unit is +stalled. While GPU context's Host state is not contained within a PBDMA unit, +software should use NV_RAMFC_METHOD* to access this information. + When a PBDMA unit is stalled due to a software method, software may use +these registers to determine the method address/data pairs that software is to +execute. After executing a software method, to indicate to hardware that the +method has been executed, software should set the METHOD_VALID field to FALSE +before clearing the NV_PPBDMA_INTR_*_DEVICE register field. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_METHOD0(i) (0x000400c0+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_METHOD0__SIZE_1 14 /* */ +#define NV_PPBDMA_METHOD0_INCR 0:0 /* RW-UF */ +#define NV_PPBDMA_METHOD0_INCR_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD0_INCR_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD0_ADDR 13:2 /* RW-UF */ +#define NV_PPBDMA_METHOD0_ADDR_NULL 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD0_SUBCH 18:16 /* RW-UF */ +#define NV_PPBDMA_METHOD0_SUBCH_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD0_FIRST 22:22 /* RW-UF */ +#define NV_PPBDMA_METHOD0_FIRST_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD0_FIRST_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD0_DUAL 23:23 /* RW-UF */ +#define NV_PPBDMA_METHOD0_DUAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD0_DUAL_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD0_VALID 31:31 /* RW-UF */ +#define NV_PPBDMA_METHOD0_VALID_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD0_VALID_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_METHOD1(i) (0x000400c8+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_METHOD1__SIZE_1 14 /* */ +#define NV_PPBDMA_METHOD1_INCR 0:0 /* RW-UF */ +#define NV_PPBDMA_METHOD1_INCR_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD1_INCR_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD1_ADDR 13:2 /* RW-UF */ +#define NV_PPBDMA_METHOD1_ADDR_NULL 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD1_SUBCH 18:16 /* RW-UF */ +#define NV_PPBDMA_METHOD1_SUBCH_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD1_FIRST 22:22 /* RW-UF */ +#define NV_PPBDMA_METHOD1_FIRST_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD1_FIRST_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD1_DUAL 23:23 /* RW-UF */ +#define NV_PPBDMA_METHOD1_DUAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD1_DUAL_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD1_VALID 31:31 /* RW-UF */ +#define NV_PPBDMA_METHOD1_VALID_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD1_VALID_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_METHOD2(i) (0x000400d0+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_METHOD2__SIZE_1 14 /* */ +#define NV_PPBDMA_METHOD2_INCR 0:0 /* RW-UF */ +#define NV_PPBDMA_METHOD2_INCR_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD2_INCR_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD2_ADDR 13:2 /* RW-UF */ +#define NV_PPBDMA_METHOD2_ADDR_NULL 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD2_SUBCH 18:16 /* RW-UF */ +#define NV_PPBDMA_METHOD2_SUBCH_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD2_FIRST 22:22 /* RW-UF */ +#define NV_PPBDMA_METHOD2_FIRST_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD2_FIRST_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD2_DUAL 23:23 /* RW-UF */ +#define NV_PPBDMA_METHOD2_DUAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD2_DUAL_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD2_VALID 31:31 /* RW-UF */ +#define NV_PPBDMA_METHOD2_VALID_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD2_VALID_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_METHOD3(i) (0x000400d8+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_METHOD3__SIZE_1 14 /* */ +#define NV_PPBDMA_METHOD3_INCR 0:0 /* RW-UF */ +#define NV_PPBDMA_METHOD3_INCR_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD3_INCR_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD3_ADDR 13:2 /* RW-UF */ +#define NV_PPBDMA_METHOD3_ADDR_NULL 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD3_SUBCH 18:16 /* RW-UF */ +#define NV_PPBDMA_METHOD3_SUBCH_ZERO 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD3_FIRST 22:22 /* RW-UF */ +#define NV_PPBDMA_METHOD3_FIRST_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD3_FIRST_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD3_DUAL 23:23 /* RW-UF */ +#define NV_PPBDMA_METHOD3_DUAL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD3_DUAL_TRUE 0x00000001 /* RW--V */ +#define NV_PPBDMA_METHOD3_VALID 31:31 /* RW-UF */ +#define NV_PPBDMA_METHOD3_VALID_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_METHOD3_VALID_TRUE 0x00000001 /* RW--V */ + +DATAn - Method FIFO Data Registers + + The NV_PPBDMA_DATA registers contain the data part of a PBDMA unit's tiny +method FIFO (Cache1). Method data from the pushbuffer is stored in Host's +method FIFO until the PBDMA unit is ready to process it. + NV_PPBDMA_DATA(0) contains the data for the first method. DATA(1) contains +data for the second method, and so forth. + This register is part of a GPU context's state. On a switch, the values of +these registers are saved to, and restored from, the NV_RAMFC_DATA* +fields of the RAMFC part of the GPU context's GPU-instance block. + Hardware maintains this information. Software should use this register +only if the GPU context is assigned to a PBDMA unit and that PBDMA unit is +stalled. While GPU context's Host state is not contained within a PBDMA unit, +software should and NV_RAMFC_DATA* to access this information. + When a PBDMA unit is stalled due to a software method, software may use +these registers to determine the data part of the method address/data pairs that +software is to execute. When handling a software method, software need only set +the method's NV_PPBDMA_METHOD_VALID bit to VALID_FALSE. It need not move or +alter the contents of this register. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + + +#define NV_PPBDMA_DATA0(i) (0x000400c4+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_DATA0__SIZE_1 14 /* */ +#define NV_PPBDMA_DATA0_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_DATA0_VALUE_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_DATA1(i) (0x000400cc+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_DATA1__SIZE_1 14 /* */ +#define NV_PPBDMA_DATA1_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_DATA1_VALUE_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_DATA2(i) (0x000400d4+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_DATA2__SIZE_1 14 /* */ +#define NV_PPBDMA_DATA2_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_DATA2_VALUE_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_DATA3(i) (0x000400dc+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_DATA3__SIZE_1 14 /* */ +#define NV_PPBDMA_DATA3_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_DATA3_VALUE_ZERO 0x00000000 /* RW--V */ + +TARGET [register] - Target Engine + + The NV_PPBDMA_TARGET_ENGINE field contains the last non-software engine +that received data from the Method Processor. This register is used to +determine if an inter-engine subchannel switch has happened. Methods executed +by Host (not by an engine), regardless of their subchannel, do not affect the +value of this field. + The imaginary software engine is treated specially. A method is directed +at the software engine by setting its NV_FIFO_DMA_*_SUBCHANNEL field to one of +the SW subchannels 5-7; see dev_ram.ref. When such a method is encountered, the +PBDMA unit freezes and raises the NV_PPBDMA_INTR_0_DEVICE interrupt. CPU +software handles the method, marks the method as having been executed by setting +NV_PPBDMA_METHOD0_VALID to FALSE, and clears the interrupt to allow the PBDMA to +continue processing subsequent methods. When initializing a channel, SW should +set the ENGINE field in NV_RAMFC_TARGET to match the engine that the channel +will serve. If the ENGINE is not a valid engine for the runqueue, Host will +force the field to the lowest numbered engine served by the runqueue. If the +ENGINE still does not match the first encountered engine method on the channel, +Host will WFI on the engine specified by the TARGET entry in RAMFC before +submitting the first method to the engine targeted by SUBCHANNEL field of the +method. + + The NV_PPBDMA_TARGET_ENG_CTX_VALID field indicates whether a valid non-CE +engine context exists for the channel loaded on the PBDMA. The field is +populated by the value in the corresponding field of the NV_RAMFC_TARGET entry +and is not modified by HW. When initializing a channel in a TSG for which a +valid engine context exists, SW should set the channel's NV_RAMFC_TARGET +ENG_CTX_VALID field to TRUE. If a valid engine context does not exist at +channel creation time, the field should be set to FALSE. When a valid engine +context is created for the TSG, the RAMFC field must be set TRUE for all +channels in the TSG. Prior to a TSG's engine context being deleted, the TSG's +channels must be disabled or unbound and the TSG preempted, followed by setting +the channels' RAMFC ENG_CTX_VALID fields to FALSE. The RAMFC field for a +channel should only be updated when the channel is disabled and idle. + The NV_PPBDMA_TARGET_CE_CTX_VALID field indicates whether a valid copy +engine method buffer exists for the channel loaded on the PBDMA. The field is +populated by the value in the corresponding field of the NV_RAMFC_TARGET entry +and is not modified by HW. When initializing a channel, SW should set the +channel's NV_RAMFC_TARGET CE_CTX_VALID field to TRUE if the copy engine method +buffer for the channel's TSG runqueue has already been created; see +NV_RAMIN_ENG_METHOD_BUFFER_ADDR_* in dev_ram.ref. If a valid method buffer has +not been created, the field should be set to FALSE. When a method buffer is +created for the TSG runqueue, this RAMFC field must be set to TRUE for all +channels in the TSG that target the runqueue. Prior to deallocating the method +buffer for a TSG runqueue, all channels in the TSG that map to the runqueue must +be disabled or unbound and the TSG preempted, followed by setting the channel's +RAMFC CE_CTX_VALID fields to FALSE. The RAMFC field for a channel should only +be updated when the channel is disabled and idle. + If Host receives an engine method for an engine that has the corresponding +NV_PPBDMA_TARGET_*_CTX_VALID field set to FALSE, Host will raise the stalling +PBDMA interrupt NV_PPBDMA_INTR_1_CTXNOTVALID. + + Host sets NV_PPBDMA_TARGET_SHOULD_SEND_HOST_TSG_EVENT whenever the PBDMA +sends any method to the graphics engine. When set, Host must eventually send a +HOST_TSG_EVENT at a TSG event point: the channel runs out of work, a TSG yield +is reached, or a semaphore acquire fails. Therefore, as a performance +optimization, Host will initiate a context load immediately following the RAMFC +load in preparation for sending the HOST_TSG_EVENT. SHOULD_SEND_HOST_TSG_EVENT +is cleared once Host issues a HOST_TSG_EVENT method or when Host does a +subchannel switch to the PBDMA's grcopy. Note if the clear occurs due to the +latter case, the initial context load may have been needless. + Host sets NV_PPBDMA_TARGET_NEEDS_HOST_TSG_EVENT to TRUE when in a TSG, the +TARGET_ENGINE is NV_ENGINE_GRAPHICS, and the PBDMA needs to send the target +engine a HOST_TSG_EVENT internal method. When TRUE on channel load, Host will +send the HOST_TSG_EVENT prior to sending any other engine methods. This is +somewhat like having another entry in the NV_PPBDMA_METHODn/DATAn Host method +fifo that comes before the 0th entry. However, Host may process other Host +methods concurrently with attempting to send the HOST_TSG_EVENT. Note that when +this field is TRUE, the PBDMA will initiate a context load immediately after the +RAMFC is loaded unless a context load is already in progress because of +CTX_RELOAD or the other PBDMA sharing the runlist. On channel creation, +software should initialize this field to FALSE in the corresponding +NV_RAMFC_TARGET entry. This bit is required for Pascal SCG functional +correctness--when Host cannot send a HOST_TSG_EVENT due to backpressure on the +method interface to FE, it must remember the fact that it still needs to send a +HOST_TSG_EVENT if the PBDMA channel switches out. Dropping a HOST_TSG_EVENT can +result in a hang in FE if the current pipe is in compute mode and the other pipe +has methods to send. + The HOST_TSG_EVENT_REASON field indicates the reason for which a +HOST_TSG_EVENT internal method must be sent when NEEDS_HOST_TSG_EVENT is TRUE. +These defines match those of the NV_PMETHOD_HOST_TSG_EVENT_REASON field of the +internal method; see internal_methods.ref. + + This register is part of a GPU context's state. During a channel switch, +the value of this register is saved to and restored from the NV_RAMFC_TARGET +entry of the GPU context's GPU-instance block. + This information is maintained by Hardware. Typically, software does not +access this register. This register is available for debug purposes. Software +should use this register only if the GPU context is assigned to a PBDMA unit and +that PBDMA unit is stalled. While a GPU context's Host state is not contained +within a PBDMA unit, software should use NV_RAMFC_TARGET to access this +information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + + +#define NV_PPBDMA_TARGET(i) (0x000400ac+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_TARGET__SIZE_1 14 /* */ + +#define NV_PPBDMA_TARGET_ENGINE 4:0 /* RW-UF */ +#define NV_PPBDMA_TARGET_ENGINE_SW 31 /* RW--V */ + +#define NV_PPBDMA_TARGET_ENG_CTX_VALID 16:16 /* RW-UF */ +#define NV_PPBDMA_TARGET_ENG_CTX_VALID_TRUE 1 /* RW--V */ +#define NV_PPBDMA_TARGET_ENG_CTX_VALID_FALSE 0 /* RW--V */ + +#define NV_PPBDMA_TARGET_CE_CTX_VALID 17:17 /* RW-UF */ +#define NV_PPBDMA_TARGET_CE_CTX_VALID_TRUE 1 /* RW--V */ +#define NV_PPBDMA_TARGET_CE_CTX_VALID_FALSE 0 /* RW--V */ + +#define NV_PPBDMA_TARGET_HOST_TSG_EVENT_REASON 25:24 /* RW-UF */ +#define NV_PPBDMA_TARGET_HOST_TSG_EVENT_REASON_PBDMA_IDLE 0x0 /* RW--V */ +#define NV_PPBDMA_TARGET_HOST_TSG_EVENT_REASON_SEMAPHORE_ACQUIRE_FAILURE 0x1 /* RW--V */ +#define NV_PPBDMA_TARGET_HOST_TSG_EVENT_REASON_TSG_YIELD 0x2 /* RW--V */ +#define NV_PPBDMA_TARGET_HOST_TSG_EVENT_REASON_HOST_SUBCHANNEL_SWITCH 0x3 /* RW--V */ + +#define NV_PPBDMA_TARGET_SHOULD_SEND_HOST_TSG_EVENT 29:29 /* RW-UF */ +#define NV_PPBDMA_TARGET_SHOULD_SEND_HOST_TSG_EVENT_TRUE 1 /* RW--V */ +#define NV_PPBDMA_TARGET_SHOULD_SEND_HOST_TSG_EVENT_FALSE 0 /* RW--V */ + +#define NV_PPBDMA_TARGET_NEEDS_HOST_TSG_EVENT 31:31 /* RW-UF */ +#define NV_PPBDMA_TARGET_NEEDS_HOST_TSG_EVENT_TRUE 1 /* RW--V */ +#define NV_PPBDMA_TARGET_NEEDS_HOST_TSG_EVENT_FALSE 0 /* RW--V */ + + +METHOD_CRC - Method CRC Value + + The NV_PPBDMA_METHOD_CRC register contains a cyclic redundancy check value +calculated from the methods sent to Host's Crossbar. This therefore excludes +software methods and Host-only methods. It may be used for debug +to determine whether the correct methods have been sent. A method CRC can +detect errors in the fetching of GP data, the fetching of PB data, and the +generation of methods from PB data. If Host fetched GP data incorrectly, +fetched PB data incorrectly, or generated methods from PB data incorrectly it is +unlikely that the CRC value calculated by Host would match the CRC value +calculated by software. + The IEEE 802.3 CRC-32 polynomial (x32 + x26 + x23 + x22 + x16 + x12 + x11 + +x10 + x8 + x7 + x5 + x4 + x2 + x + 1) is used to calculate CRC values. Methods +can be sent to Host's Crossbar as single methods, or dual methods. The CRC is +calculated as if dual methods were always sent as two single methods. Each +method consists of a subchannel identifer (3 bits), and a method address (12 +bits), and method data (32 bits). For the CRC calculation, a method is +organized into a 6-byte value. Bytes are added to the CRC from the least +significant byte to the most significant byte. + + + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_METHOD_CRC field of +the RAMFC part of the GPU context's GPU-instance block. + This information is maintained by hardware. Software may use special +methods (NV_UDMA_CRC_CHECK) to check and clear the CRC value. Typically, +software does not access this register directly. This register is available to +software only for debug. Software should use this register only if the GPU +context is assigned to a PBDMA unit and that PBDMA unit is stalled. While a GPU +context's Host state is not contained within a PBDMA unit, software should use +NV_RAMFC_METHOD_CRC to access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. This register was introduced in +Fermi. + + + +#define NV_PPBDMA_METHOD_CRC(i) (0x000400b0+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_METHOD_CRC__SIZE_1 14 /* */ + +#define NV_PPBDMA_METHOD_CRC_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_METHOD_CRC_VALUE_ZERO 0x00000000 /* RW--V */ + +REF - Reference Count + + Software may use Reference Counts to monitor Host's progress processing a +pushbuffer. The pushbuffer specifies that the Reference Count be written. For +synchronization, software might wait until a particular Reference Count value +has a particular value before proceeding. + The NV_PPBDMA_REF register holds a 32-bit Reference Count value that can be +written with the NV_UDMA_SET_REF method. The value written to the register is +from the NV_UDMA_SET_REF method's parameter. The value is not written to this +register until the target engine reports that it is idle and the memory +subsystem has been flushed. Waiting for the engine to become idle and the +memory subsysten to be flushed ensures that all previous instructions in the +current channel context have completed execution. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_REF entry of the +RAMFC part of the GPU context's GPU-instance block. + Typically, while a GPU context is bound to a channel, software uses +NV_RAMUSERD_REF to access this information. Typically, software does not access +this register directly. This register is available to software only for debug. +Software should use this register only if the GPU context is assigned to a PBDMA +unit and that PBDMA unit is stalled. While a GPU context is not assigned to a +PBDMA unit and is not bound to a channel, software should use NV_RAMFC_REF to +access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + + +#define NV_PPBDMA_REF(i) (0x00040028+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_REF__SIZE_1 14 /* */ + +#define NV_PPBDMA_REF_CNT 31:0 /* RW-UF */ +#define NV_PPBDMA_REF_CNT_ZERO 0x00000000 /* RW--V */ + + + +RUNTIME - Active run time on Host + + The NV_PPBDMA_RUNTIME register contains the amount of time a GPU context +has been actively running within Host. This is not the amount of time that the +GPU context has been actively running on an engine. The amount of time is +measured in 1024 ns ticks from the PTIMER. Software may set this value to 0 and +can later read the value to see whether the GPU context ran. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_RUNTIME entry of the +RAMFC part of the GPU context's GPU-instance block. + This information is maintained by hardware. Software may read this +register at any time. Software should write this register only if the GPU +context is assigned to a PBDMA unit and that PBDMA unit is stalled. While a GPU +context's Host state is not contained within a PBDMA unit, software should use +NV_RAMFC_RUNTIME to access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_RUNTIME(i) (0x0004002c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_RUNTIME__SIZE_1 14 /* */ + +#define NV_PPBDMA_RUNTIME_VALUE 31:0 /* RW-UF */ +#define NV_PPBDMA_RUNTIME_VALUE_ZERO 0x00000000 /* RW--V */ + + +SEM_ADDR_LO [register] - Semaphore Address Low Backing Register + + Semaphores are synchronization primitives located in memory; see the +documentation above the NV_UDMA_SEM_ADDR_LO method description for a brief +overview. + The NV_PPBDMA_SEM_ADDR_LO register specifies the least significant bits of +a semaphore's virtual memory address. This register is written to via the +NV_UDMA_SEM_ADDR_LO method. See the method documentation of +NV_UDMA_SEM_ADDR_LO for information regarding usage and behavior. + This register is part of a channel's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_SEM_ADDR_LO field of +the RAMFC part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_SEM_ADDR_LO instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_SEM_ADDR_LO(i) (0x0004003c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SEM_ADDR_LO__SIZE_1 14 /* */ + +#define NV_PPBDMA_SEM_ADDR_LO_ADDR 31:2 /* RW-UF */ +#define NV_PPBDMA_SEM_ADDR_LO_ADDR_ZERO 0x00000000 /* RW--V */ + + +SEM_ADDR_HI [register] - Semaphore Address High Backing Register + + The NV_PPBDMA_SEM_ADDR_HI register contains the most significant 8 bits of +a semaphore's 40-bit virtual memory address. This register is written to via +the NV_UDMA_SEM_ADDR_HI method. See the method documentation of +NV_UDMA_SEM_ADDR_HI for information regarding usage and behavior. + This register is part of a channel's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_SEM_ADDR_HI field of +the RAMFC part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_SEM_ADDR_HI instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_SEM_ADDR_HI(i) (0x00040038+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SEM_ADDR_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_SEM_ADDR_HI_ADDR 7:0 /* RW-UF */ +#define NV_PPBDMA_SEM_ADDR_HI_ADDR_ZERO 0x00000000 /* RW--V */ + + +SEM_PAYLOAD_LO [register] - Semaphore Payload Low Backing Register + + The NV_PPBDMA_SEM_PAYLOAD_LO register contains the lowest 32 bits of the +semaphore payload. The payload is used to either write to the semaphore or +provide an operand for a semaphore operation. This register is written to via +the NV_UDMA_SEM_PAYLOAD_LO method. See the method documentation of +NV_UDMA_SEM_PAYLOAD_LO for information regarding usage and behavior. + This register is part of a channel's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_SEM_PAYLOAD_LO field +of the RAMFC part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_SEM_PAYLOAD_LO instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_SEM_PAYLOAD_LO(i) (0x00040040+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SEM_PAYLOAD_LO__SIZE_1 14 /* */ + +#define NV_PPBDMA_SEM_PAYLOAD_LO_DATA 31:0 /* RW-VF */ +#define NV_PPBDMA_SEM_PAYLOAD_LO_DATA_ZERO 0x00000000 /* RW--V */ + + +SEM_PAYLOAD_HI [register] - Semaphore Payload High Backing Register + + The NV_PPBDMA_SEM_PAYLOAD_HI register contains the highest 32 bits of the +semaphore payload. The payload is used to either write to the semaphore or +provide an operand for a semaphore operation. This register is written to via +the NV_UDMA_SEM_PAYLOAD_HI method. See the method documentation of +NV_UDMA_SEM_PAYLOAD_HI for information regarding usage and behavior. + This register is part of a channel's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_SEM_PAYLOAD_HI field of +the RAMFC part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_SEM_PAYLOAD_HI instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_SEM_PAYLOAD_HI(i) (0x0004009c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SEM_PAYLOAD_HI__SIZE_1 14 /* */ + +#define NV_PPBDMA_SEM_PAYLOAD_HI_DATA 31:0 /* RW-VF */ +#define NV_PPBDMA_SEM_PAYLOAD_HI_DATA_ZERO 0x00000000 /* RW--V */ + + +SEM_EXECUTE [register] - Semaphore Operation Backing Register + + The NV_PPBDMA_SEM_EXECUTE register contains a type of semaphore operation +to be performed and additional parameters for that operation. This register is +written to via the NV_UDMA_SEM_EXECUTE method. + A semaphore operation is launched by executing the NV_UDMA_SEM_EXECUTE +method. This semaphore operation uses the semaphore address from the +NV_PPBDMA_SEM_ADDR_LO and NV_PPBDMA_SEM_ADDR_HI registers, and uses the +payload value from the NV_PPBDMA_SEM_PAYLOAD_LO and NV_PPBDMA_SEM_PAYLOAD_HI +registers. However, after the semaphore operation has completed, these +registers may be updated individually by other semaphore methods; that is, they +do not retain an accurate view of the most previously executed semaphore +operation. See the method documentation of NV_UDMA_SEM_EXECUTE for information +regarding usage and behavior. + During execution of the semaphore operation, the ACQUIRE_FAIL field of the +NV_PPBDMA_SEM_EXECUTE register indicates whether or not an attempt to acquire a +semaphore has failed or faulted. This field is used by Host to determine +whether the NV_PPBDMA_ACQUIRE_DEADLINE register should be updated. If the +value of this field is FALSE, this means an acquire has not yet been attempted, +and Host will set ACQUIRE_DEADLINE to a new value. If this field is TRUE, this +means an acquire has been attempted and has failed, and Host will not modify +ACQUIRE_DEADLINE. + The ACQUIRE_FAIL field also indicates whether, during the execution of a +NV_UDMA_CLEAR_FAULTED method, an attempt to clear a _FAULTED bit of a channel's +NV_PCCSR_CHANNEL register has failed or not. If this field is FALSE, this might +mean a CLEAR_FAULTED has not yet been attempted, and Host will set +ACQUIRE_DEADLINE to a new value. If CLEAR_FAULTED method fails the field is set +to TRUE. By reading the PPBDMA_METHOD0 register, SW can determine the method +for which the field is in use. Host will set this field to FALSE when the +CLEAR_FAULTED method succeeds or its timeout is triggered. + Note that during execution of a semaphore operation, the value of the +NV_PPBDMA_SEM_EXECUTE register is the same as the value of NV_PPBDMA_DATA0, +with the exception of the NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL field. If +software modifies NV_PPBDMA_DATA0 during execution of a NV_UDMA_SEM_EXECUTE +method, it must be careful to update the NV_PPBDMA_SEM_EXECUTE register to be +consistent with the DATA0 register. + This register is part of a channel's state. When the channel is switched +out, the value of this register is saved to, and restored from, the +NV_RAMFC_SEM_EXECUTE field of the RAMFC part of the channel's instance block. + Software typically does not access this register directly, unless this is +being done while debugging. Software can directly access this register without +the risk of race conditions when the channel is loaded on a PBDMA unit and that +PBDMA unit is stalled. While a channel is not loaded on a PBDMA unit, software +can read from the NV_RAMFC_SEM_EXECUTE instance block field to access this +information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + +#define NV_PPBDMA_SEM_EXECUTE(i) (0x00040044+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SEM_EXECUTE__SIZE_1 14 /* */ + +#define NV_PPBDMA_SEM_EXECUTE_OPERATION 2:0 /* RWXVF */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_ACQUIRE 0x00000000 /* -W--V */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_RELEASE 0x00000001 /* -W--V */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_ACQ_STRICT_GEQ 0x00000002 /* -W--V */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_ACQ_CIRC_GEQ 0x00000003 /* -W--V */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_ACQ_AND 0x00000004 /* -W--V */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_ACQ_NOR 0x00000005 /* -W--V */ +#define NV_PPBDMA_SEM_EXECUTE_OPERATION_REDUCTION 0x00000006 /* -W--V */ + +#define NV_PPBDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG 12:12 /* RW-VF */ +#define NV_PPBDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_DIS 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_EN 0x00000001 /* RW--V */ + +#define NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL 19:19 /* RWXVF */ +#define NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL_FALSE 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL_TRUE 0x00000001 /* RW--V */ + +#define NV_PPBDMA_SEM_EXECUTE_RELEASE_WFI 20:20 /* RW-VF */ +#define NV_PPBDMA_SEM_EXECUTE_RELEASE_WFI_DIS 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_RELEASE_WFI_EN 0x00000001 /* RW--V */ + +#define NV_PPBDMA_SEM_EXECUTE_PAYLOAD_SIZE 24:24 /* RWXVF */ +#define NV_PPBDMA_SEM_EXECUTE_PAYLOAD_SIZE_32BIT 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_PAYLOAD_SIZE_64BIT 0x00000001 /* RW--V */ + +#define NV_PPBDMA_SEM_EXECUTE_RELEASE_TIMESTAMP 25:25 /* RW-VF */ +#define NV_PPBDMA_SEM_EXECUTE_RELEASE_TIMESTAMP_DIS 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_RELEASE_TIMESTAMP_EN 0x00000001 /* RW--V */ + +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION 30:27 /* RWXVF */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_IMIN 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_IMAX 0x00000001 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_IXOR 0x00000002 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_IAND 0x00000003 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_IOR 0x00000004 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_IADD 0x00000005 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_INC 0x00000006 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_DEC 0x00000007 /* RW--V */ + +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_FORMAT 31:31 /* RW-VF */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_FORMAT_SIGNED 0x00000000 /* RW--V */ +#define NV_PPBDMA_SEM_EXECUTE_REDUCTION_FORMAT_UNSIGNED 0x00000001 /* RW--V */ + + +ACQUIRE_DEADLINE - Deadline for Semaphore Acquire and Clear Faulted Timeouts + + The NV_PPBDMA_ACQUIRE_DEADLINE register contains timeout information used +by the NV_UDMA_SEM_EXECUTE and NV_UDMA_CLEAR_FAULTED methods. + + During execution of a semaphore acquire operation, the timeout period from +NV_PPBDMA_ACQUIRE_TIMEOUT is added to the current time from PTIMER to compute +the time at which the acquire will time out. This timeout time is stored in +NV_PPBDMA_ACQUIRE_DEADLINE_TIMESTAMP. + Whenever an acquire is retried, the current time from the PTIMER is +compared with the value in this register. The comparison is circular. If an +acquire attempt fails to match, and if the current time is not between the start +time (STARTTIME = ACQUIRE_DEADLINE - ACQUIRE_TIMEOUT) and ACQUIRE_DEADLINE in +the circle of 32-bit unsigned integers, then the deadline was missed, and Host +will raise the NV_PPBDMA_INTR_0_ACQUIRE interrupt. + + During execution of a CLEAR_FAULTED method, if the targeted channel has not +reported FAULTED and NV_PFIFO_CLEAR_FAULTED_TIMEOUT_DETECTION is ENABLED, the +value in NV_PFIFO_CLEAR_FAULTED_TIMEOUT_PERIOD is added to the current time from +PTIMER to compute the time at which the CLEAR_FAULTED will time out. This +timeout time is stored in NV_PPBDMA_ACQUIRE_DEADLINE_TIMESTAMP. + The CLEAR_FAULTED method will be retried approximately every microsecond +while its containing channel is loaded and active on the PBDMA. When +CLEAR_FAULTED is retried and its targeted FAULTED bit is still FALSE, the +current time from PTIMER is compared against the ACQUIRE_DEADLINE_TIMESTAMP. If +the 32 least-significant microseconds of the PTIMER time exceeds the TIMESTAMP +in a circular 32-bit comparison, the deadline was missed, and Host will raise +the NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR interrupt. + + This register is part of a channel's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_ACQUIRE_DEADLINE +field of the RAMFC part of the channel's instance block. + The value of this register is maintained by hardware. Software typically +does not access this register directly, unless is this is being done while +debugging. Software can directly access this register without the risk of race +conditions when the channel is loaded on a PBDMA unit and that PBDMA unit is +stalled. While a channel is not loaded on a PBDMA unit, software can read from +the NV_RAMFC_ACQUIRE_DEADLINE instance block field to access this information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + + +#define NV_PPBDMA_ACQUIRE_DEADLINE(i) (0x00040034+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_ACQUIRE_DEADLINE__SIZE_1 14 /* */ + +#define NV_PPBDMA_ACQUIRE_DEADLINE_TIMESTAMP 31:0 /* RW-UF */ +#define NV_PPBDMA_ACQUIRE_DEADLINE_TIMESTAMP_ZERO 0x00000000 /* RW--V */ + + +ACQUIRE - Acquire Periods + + The NV_UDMA_SEM_EXECUTE method may specify a semaphore acquire operation, +which involves not continuing channel execution until a given semaphore has a +particular value. If a semaphore acquire fails (polling the semaphore reveals +it does not have the desired value), the PBDMA unit may either switch out to a +different channel, or keep trying to acquire the semaphore; see the +documentation for the NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG field. If the +channel does not switch out and continues trying to acquire the semaphore, then +the NV_PPBDMA_ACQUIRE_RETRY register controls how long to wait between attempts +to acquire the semaphore. + The NV_PPBDMA_ACQUIRE_RETRY_MAN and RETRY_EXP fields specify the minimum +number of internal-domain cycles that Host will wait before retrying a failed +Semaphore Acquire operation. The wait period is MAN*2^EXP nvclk cycles. +Increasing the period between acquire attempts will reduce the memory throughput +consumed, but may increase the time between when the semaphore is released and +when it is acquired. + The NV_PPBDMA_ACQUIRE_TIMEOUT_MAN and TIMEOUT_EXP fields specify the +maximum number of 1024ns periods that a acquire attempt can fail before an +acquire timeout interrupt is initiated. The acquire timeout period is +1024*MAN*2^EXP ns. TIMEOUT_EN specifies whether acquire timeouts are enabled. +The timeout period is limited to a maximum of 0x7FFF8000 so that +NV_PPBDMA_ACQUIRE_DEADLINE can fit into a single 32-bit register. + This register is part of a channel's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_ACQUIRE field of the +RAMFC part of the channel's instance block. + Typically, this register is initialized in NV_RAMFC_ACQUIRE when the +channel is first created. Software typically does not access this register +directly, unless this is being done while debugging. Software can directly +access this register without the risk of race conditions when the channel is +loaded on a PBDMA unit and that PBDMA unit is stalled. While a channel is not +loaded on a PBDMA unit, software can read from the NV_RAMFC_ACQUIRE instance +block field to access this information. + One of this type of register exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + + +#define NV_PPBDMA_ACQUIRE(i) (0x00040030+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_ACQUIRE__SIZE_1 14 /* */ + +#define NV_PPBDMA_ACQUIRE_RETRY_MAN 6:0 /* RW-UF */ +#define NV_PPBDMA_ACQUIRE_RETRY_MAN_2 0x00000002 /* RW--V */ +#define NV_PPBDMA_ACQUIRE_RETRY_EXP 10:7 /* RW-UF */ +#define NV_PPBDMA_ACQUIRE_RETRY_EXP_2 0x00000002 /* RW--V */ + +#define NV_PPBDMA_ACQUIRE_TIMEOUT_EXP 14:11 /* RW-UF */ +#define NV_PPBDMA_ACQUIRE_TIMEOUT_EXP_MAX 0x0000000F /* RW--V */ +#define NV_PPBDMA_ACQUIRE_TIMEOUT_MAN 30:15 /* RW-UF */ +#define NV_PPBDMA_ACQUIRE_TIMEOUT_MAN_MAX 0x0000FFFF /* RW--V */ +#define NV_PPBDMA_ACQUIRE_TIMEOUT_EN 31:31 /* RW-UF */ +#define NV_PPBDMA_ACQUIRE_TIMEOUT_EN_DISABLE 0x00000000 /* RW--V */ +#define NV_PPBDMA_ACQUIRE_TIMEOUT_EN_ENABLE 0x00000001 /* RW--V */ + + +^L +STATUS - PBDMA Unit Status Register + + The NV_PPBDMA_STATUS register contains the status of a PBDMA unit (Pusher, +Cache1, and Puller). + The NV_PPBDMA_STATUS_GPF field contains the status of the PBDMA unit's +GP-Entry fetching. If this field is GPF_EMPTY, then GP_GET equals GP_PUT, so +there are no more GP entries to be fetched. If this field is GPF_SUSPENDED, +then GP-Entry fetching has been suspended (either by Host's Scheduler, by a +stalling interrupt condition). +If this field is GPF_BLOCKED, then the GP-Entry fetching is blocked from issuing new +GP-entry fetch requests because Host's Latency Buffer will not accept them +(either there is no space in Host's Latency Buffer to store the return data, or +Host's FB-request Arbiter is not accepting requests from the Latency Buffer). +Otherwise, this field is GPF_BUSY. + The NV_PPBDMA_STATUS_GPP field contains the status of the PBDMA unit's +GP-Entry processing. If this field is GPP_EMPTY, then the PBDMA unit has no +GP-Entry to process. If this field is GPP_SUSPENDED, then GP-Entry processing +has been suspended (either by Host's Scheduler or by a stalling interrupt +condition). If this field is GPP_BLOCKED, then GP-Entry processing is +blocked from issuing new pushbuffer read requests because the Latency +Buffer will not accept them. Otherwise, this field is GPP_BUSY. + The NV_PPBDMA_STATUS_PBP field contains the status of the PBDMA unit's +pushbuffer data processing. If this field is PBP_EMPTY, then the PBDMA unit has +no pushbuffer data to process. If this field is PBP_SUSPENDED, then +pushbuffer's processing operations have been suspended (either by Host's +Scheduler or a stalling interrupt condition). If this field is +PBP_BLOCKED, then pushbuffer processing is blocked because Host's +method FIFO is full. Otherwise, this field is PBP_BUSY. + The NV_PPBDMA_STATUS_MP field contains the status of a PBDMA unit's method +processing. If this field is MP_EMPTY, then Host's method FIFO is empty. If +this field is MP_SUSPENDED, then method processing has been suspended (either by +Host's Scheduler, by a NV_UDMA_YIELD method,or by an inter-engine +subchannel switch). If this field is MP_BLOCKED then method +processing is blocked from making progress either because of a +semaphore acquire, a FB flush, because Host's Crossbar is not accepting methods, or +because Host's Semaphore Processor, or Run-List Processor is not accepting a +request or notification. Otherwise, this field is MP_BUSY. + + + The NV_PPBDMA_STATUS_PBDMA field contains the state of the PBDMA unit as a +whole. If this field is PBDMA_EMPTY, then all of the PBDMA unit's sub-blocks +are reporting that they are empty. If this field is PBDMA_SUSPENDED, then all +of the PBDMA unit's +sub-blocks are reporting that they are suspended. If this field is +PBDMA_BLOCKED, then all of the PBDMA unit's sub-blocks are reporting that they +are blocked from making progress. Otherwise, this field is PBDMA_BUSY. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on Host's internal domain +clock. This register is new for Fermi. + While NV_PPBDMA_CHANNEL_VALID is FALSE, no channel is present in +the PBDMA, so, like other non-configuration NV_PPBDMA registers, while +NV_PPBDMA_CHANNEL_VALID is FALSE, this register should be ignored. + + +#define NV_PPBDMA_STATUS(i) (0x00040100+(i)*8192) /* R--4A */ +#define NV_PPBDMA_STATUS__SIZE_1 14 /* */ + +#define NV_PPBDMA_STATUS_GPF 3:0 /* R-IUF */ +#define NV_PPBDMA_STATUS_GPF_EMPTY 0x00000000 /* R-I-V */ +#define NV_PPBDMA_STATUS_GPF_SUSPENDED 0x00000001 /* R---V */ +#define NV_PPBDMA_STATUS_GPF_BLOCKED 0x00000002 /* R---V */ +#define NV_PPBDMA_STATUS_GPF_BUSY 0x00000008 /* R---V */ +#define NV_PPBDMA_STATUS_GPP 7:4 /* R-IUF */ +#define NV_PPBDMA_STATUS_GPP_EMPTY 0x00000000 /* R-I-V */ +#define NV_PPBDMA_STATUS_GPP_SUSPENDED 0x00000001 /* R---V */ +#define NV_PPBDMA_STATUS_GPP_BLOCKED 0x00000002 /* R---V */ +#define NV_PPBDMA_STATUS_GPP_BUSY 0x00000008 /* R---V */ +#define NV_PPBDMA_STATUS_PBP 11:8 /* R-IUF */ +#define NV_PPBDMA_STATUS_PBP_EMPTY 0x00000000 /* R-I-V */ +#define NV_PPBDMA_STATUS_PBP_SUSPENDED 0x00000001 /* R---V */ +#define NV_PPBDMA_STATUS_PBP_BLOCKED 0x00000002 /* R---V */ +#define NV_PPBDMA_STATUS_PBP_BUSY 0x00000008 /* R---V */ +#define NV_PPBDMA_STATUS_MP 15:12 /* R-IUF */ +#define NV_PPBDMA_STATUS_MP_EMPTY 0x00000000 /* R-I-V */ +#define NV_PPBDMA_STATUS_MP_SUSPENDED 0x00000001 /* R---V */ +#define NV_PPBDMA_STATUS_MP_BLOCKED 0x00000002 /* R---V */ +#define NV_PPBDMA_STATUS_MP_BUSY 0x00000008 /* R---V */ +#define NV_PPBDMA_STATUS_PBDMA 31:28 /* R-IUF */ +#define NV_PPBDMA_STATUS_PBDMA_EMPTY 0x00000000 /* R-I-V */ +#define NV_PPBDMA_STATUS_PBDMA_SUSPENDED 0x00000001 /* R---V */ +#define NV_PPBDMA_STATUS_PBDMA_BLOCKED 0x00000002 /* R---V */ +#define NV_PPBDMA_STATUS_PBDMA_BUSY 0x00000008 /* R---V */ + + + +CHANNEL - Channel Identifier + + The NV_PPBDMA_CHANNEL register contains the channel number that is +currently assigned to a PBDMA unit. If VALID_FALSE, then this PBDMA unit +does not contain any valid state. After loading state from RAMFC, VALID +is set to TRUE. After saving the state to RAMFC, or during the load of RAMFC, +VALID is set to FALSE. + This information is maintained by Hardware. This register is available for +debug purposes. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on the internal-domain +clock. + + +#define NV_PPBDMA_CHANNEL(i) (0x00040120+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_CHANNEL__SIZE_1 14 /* */ + +#define NV_PPBDMA_CHANNEL_CHID 11:0 /* */ +#define NV_PPBDMA_CHANNEL_CHID_HW 11:0 /* RWXUF */ +#define NV_PPBDMA_CHANNEL_VALID 13:13 /* RWIVF */ +#define NV_PPBDMA_CHANNEL_VALID_FALSE 0x00000000 /* RWI-V */ +#define NV_PPBDMA_CHANNEL_VALID_TRUE 0x00000001 /* RW--V */ + + + +GP_SHADOW_0 and GP_SHADOW_1 - Last Received GP-Entry Header + + The NV_PPBDMA_GP_SHADOW_* registers contain the last GP entry that was +received by the PBDMA unit. This is the data at NV_PPBDMA_GP_GET-8. If the +PBDMA unit is indicating an invalid GP entry (NV_PPBDMA_INTR_0_GPENTRY), then +this register will contain that entry. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on the internal-domain +clock. + + +#define NV_PPBDMA_GP_SHADOW_0(i) (0x00040110+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_SHADOW_0__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_SHADOW_0_VALUE 31:0 /* RWXUF */ + +#define NV_PPBDMA_GP_SHADOW_1(i) (0x00040114+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_GP_SHADOW_1__SIZE_1 14 /* */ + +#define NV_PPBDMA_GP_SHADOW_1_VALUE 31:0 /* RWXUF */ + + +HDR_SHADOW - Last fetched Pushbuffer-Entry Header + + The NV_PPBDMA_HDR_SHADOW register contains the raw PB instruction +corresponding to the information in NV_PPBDMA_PB_HEADER. If the PBDMA unit is +indicating an invalid PB entry (NV_PPBDMA_INTR_0_PBENTRY), then this register +will contain the raw data for that entry. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on the internal-domain +clock. + + +#define NV_PPBDMA_HDR_SHADOW(i) (0x00040118+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_HDR_SHADOW__SIZE_1 14 /* */ + +#define NV_PPBDMA_HDR_SHADOW_VALUE 31:0 /* RWXUF */ + + + +MEM_OP_* [registers] - Memory-Operation Operand Backing Registers + + The NV_PPBDMA_MEM_OP_* registers contain bits 95:0 of the operands +to a memory management operation. Memory management operations are +triggered by NV_UDMA_MEM_OP_D methods; see NV_UDMA_MEM_OP* below for the method +documentation. + This register is part of a GPU context's state. On a switch, the value of +these registers are saved to, and restored from, the NV_RAMFC_MEM_OP_A, +NV_RAMFC_MEM_OP_B, and NV_RAMFC_MEM_OP_C fields of the RAMFC part of the GPU +context's GPU-instance block. + Software uses NV_UDMA_MEM_OP_* methods to alter this information. +Typically, software does not access this register directly. This register is +available to software only for debug. Software should use this register only if +the GPU context is assigned to a PBDMA unit and that PBDMA unit is stalled. +While a GPU context's Host state is not contained within a PBDMA unit, software +should use NV_RAMFC_MEM_OP_C to access this information. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. These registers were added +and/or moved for Pascal (MEM_OP_A used to exist at offsets 400a0 + i*8192). + + + +#define NV_PPBDMA_MEM_OP_A(i) (0x00040004+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_MEM_OP_A__SIZE_1 14 /* */ +#define NV_PPBDMA_MEM_OP_A_DATA 31:0 /* RW-UF */ +#define NV_PPBDMA_MEM_OP_B(i) (0x00040064+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_MEM_OP_B__SIZE_1 14 /* */ +#define NV_PPBDMA_MEM_OP_B_DATA 31:0 /* RW-UF */ +#define NV_PPBDMA_MEM_OP_C(i) (0x000400a0+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_MEM_OP_C__SIZE_1 14 /* */ +#define NV_PPBDMA_MEM_OP_C_DATA 31:0 /* RW-UF */ + + +SIGNATURE - RAMFC Signature Register + + This register contains a value that specifies which Host class ID software +expects the hardware to support, and indicates if the RAMFC might be valid. It +is intended for debug and as a runtime check that RM is exposing the proper Host +class ID for the chip. + When the RAMFC part of a GPU context's instance block is restored into +Host, if the HW field does not contain the class ID specified by +HW_HOST_CLASS_ID or the value HW_VALID, then Host will freeze and initiate an +NV_PPBDMA_INTR_*_SIGNATURE interrupt. Host's class ID can be queried at runtime +from NV_PFIFO_CFG2_HOST_CLASS_ID; see dev_fifo.ref. Note the Host class is also +known as "channel_gpfifo". HW_VALID (0xface) is meant to be used by RM to ease +transitions between Host classes for new architectures. The HW field does not +provide a direct check for Host methods sent by a given user mode driver; +attempting to send methods from a mismatching Host class may or may not work +depending on the method. + The SW field is for use by software. Host is not affected by the value. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_SIGNATURE field of +the RAMFC part of the GPU context's GPU-instance block. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. This register was added for +Fermi. + + + +#define NV_PPBDMA_SIGNATURE(i) (0x00040010+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SIGNATURE__SIZE_1 14 /* */ +#define NV_PPBDMA_SIGNATURE_HW 15:0 /* RW-UF */ +#define NV_PPBDMA_SIGNATURE_HW_VALID 0x0000face /* RW--V */ +#define NV_PPBDMA_SIGNATURE_HW_HOST_CLASS_ID 50031 /* RW--V */ +#define NV_PPBDMA_SIGNATURE_SW 31:16 /* RW-UF */ +#define NV_PPBDMA_SIGNATURE_SW_ZERO 0x00000000 /* RW--V */ + + +USERD - Address of User-Driver Accessible State + + A user driver is permitted access to some, but not all, of a GPU context's +state (for example, GP_PUT). NV_PPBDMA_USERD contains the physical address of a +block of memory that contains the state the user-driver may access. This block +is NV_RAMUSERD_CHAN_SIZE-byte aligned. Please see the NV_RAMUSERD section of +"dev_ram.ref" for a description of the user-driver accessible state. + TARGET - The aperture of the physical address space in which USERD resides. + ADDR - The low bits of the block-aligned (right shifted) USERD address. +This field corresponds to the low 32 bits of the byte address with the low bits +corresponding to its block alignment masked off. + HI_ADDR - The high bits of the USERD address. This field specifieds bits +32+ of the USERD byte-aligned address. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_USERD and +NV_RAMFC_USERD_HI fields of the RAMFC part of the GPU context's GPU-instance +block. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. This register was added for +Fermi. + + +#define NV_PPBDMA_USERD(i) (0x00040008+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_USERD__SIZE_1 14 /* */ +#define NV_PPBDMA_USERD_TARGET 1:0 /* RW-UF */ +#define NV_PPBDMA_USERD_TARGET_VID_MEM 0x00000000 /* RW--V */ +#define NV_PPBDMA_USERD_TARGET_VID_MEM_NVLINK_COHERENT 0x00000001 /* RW--V */ +#define NV_PPBDMA_USERD_TARGET_SYS_MEM_COHERENT 0x00000002 /* RW--V */ +#define NV_PPBDMA_USERD_TARGET_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */ +#define NV_PPBDMA_USERD_ADDR 31:9 /* RW-UF */ +#define NV_PPBDMA_USERD_ADDR_ZERO 0x00000000 /* RW--V */ + +#define NV_PPBDMA_USERD_HI(i) (0x0004000c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_USERD_HI__SIZE_1 14 /* */ +#define NV_PPBDMA_USERD_HI_ADDR 7:0 /* RW-UF */ +#define NV_PPBDMA_USERD_HI_ADDR_ZERO 0x00000000 /* RW--V */ + + +CONFIG - Miscellaneous Configuration Register + + The CONFIG register is used to configure miscellaneous functions of a PBDMA on a +per-channel basis. Software can configure these bits via the corresponding +NV_RAMFC_CONFIG dword in each channel's RAMFC. + The L2_EVICT field controls the l2_class field for memory requests from a PBDMA unit. + The CE_SPLIT field controls Host taking large copies and splitting them into smaller +copies to allow fast Copy Engine (CE) switching. If the field value is ENABLE, Host will analyze +each copy command to determine if the copy should be split into smaller copies, and may +modify the commands sent to the CE. + + +If the field value is DISABLE, Host will not modify the copy commands sent to the CE. +If the field is written from ENABLE to DISABLE while Host is in the middle of splitting a copy, +Host will continue splitting the current copy until the whole copy has been split. Future +copies, however, will not be split while the field remains set to DISABLE. + The THROTTLE_MODE field controls how much work Host sends to the CE. The +goal is to send enough work to keep the Copy Engine busy while Host switches +away to another channel to check on a semaphore, while at the same time +maintaining the CE preemption latency below 10 microseconds. When the field is +set to THROTTLE, Host will limit the number of copies it sends to the CE. This +is legacy behavior and is needed on PCIE GEN3 systems. Setting the field to +NO_THROTTLE will prevent Host from limiting the amount of work that Host sends +to the CE. NVLINK2 and PCIE GEN4_LITE systems should have the field set to +NO_THROTTLE. +Note: Because this is a static setting, if a system slowdown occurs and the link +is downgraded, preemption latency may exceed 10 microseconds. + The AUTH_LEVEL field specifies the authorization level of the channel. +When AUTH_LEVEL is NON_PRIVILEGED, the channel will not be able to execute +privileged operations via Host methods on its pushbuffer. Any attempt to do so +will result in the NV_PPBDMA_INTR_*_METHOD interrupt being raised. When +AUTH_LEVEL is PRIVILEGED, the channel will be able to execute all methods. + The USERD_WRITEBACK field controls whether USERD will be written back to +memory. Regardless of the setting here, USERD is always written back to memory +when the channel switches off of the PBDMA. When USERD_WRITEBACK is ENABLE, +USERD will also be written back to memory whenever the PBDMA falls idle or the +writeback timer configured via NV_PFIFO_USERD_WRITEBACK_TIMER expires. When the +field value is DISABLE, the writeback only occurs on channel save. Note GP_PUT +does not get written back to memory because it is written by software; +otherwise, GP_PUT updates could be lost on writeback. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to, and restored from, the NV_RAMFC_CONFIG field of the RAMFC +part of the GPU context's GPU-instance block. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. This register was added for +Fermi. + +#define NV_PPBDMA_CONFIG(i) (0x000400f4+(i)*8192) /* R--4A */ +#define NV_PPBDMA_CONFIG__SIZE_1 14 /* */ + + +#define NV_PPBDMA_CONFIG_L2_EVICT 1:0 /* R--VF */ +#define NV_PPBDMA_CONFIG_L2_EVICT_FIRST 0x00000000 /* R---V */ +#define NV_PPBDMA_CONFIG_L2_EVICT_NORMAL 0x00000001 /* R---V */ + + +#define NV_PPBDMA_CONFIG_CE_SPLIT 4:4 /* R--VF */ +#define NV_PPBDMA_CONFIG_CE_SPLIT_ENABLE 0x00000000 /* R---V */ +#define NV_PPBDMA_CONFIG_CE_SPLIT_DISABLE 0x00000001 /* R---V */ +#define NV_PPBDMA_CONFIG_CE_THROTTLE_MODE 5:5 /* R--VF */ +#define NV_PPBDMA_CONFIG_CE_THROTTLE_MODE_THROTTLE 0x00000000 /* R---V */ +#define NV_PPBDMA_CONFIG_CE_THROTTLE_MODE_NO_THROTTLE 0x00000001 /* R---V */ +#define NV_PPBDMA_CONFIG_AUTH_LEVEL 8:8 /* R--VF */ +#define NV_PPBDMA_CONFIG_AUTH_LEVEL_NON_PRIVILEGED 0x00000000 /* R---V */ +#define NV_PPBDMA_CONFIG_AUTH_LEVEL_PRIVILEGED 0x00000001 /* R---V */ +#define NV_PPBDMA_CONFIG_USERD_WRITEBACK 12:12 /* R--VF */ +#define NV_PPBDMA_CONFIG_USERD_WRITEBACK_DISABLE 0x00000000 /* R---V */ +#define NV_PPBDMA_CONFIG_USERD_WRITEBACK_ENABLE 0x00000001 /* R---V */ + + + After a channel switch, the first method Host will send to the graphics or +copy engine is a NV_PMETHOD_SET_CHANNEL_INFO method. The lower 16 bits of the +payload of this method (defined in internal_methods.ref) will consist of the +lower 16 bit value from this register. The upper 16 bits of the payload will +be populated by Host with the channel ID. + The lower 16 bits of the value of this method is expected to be set in +RAMFC by writing 32 bits to the offset specified as NV_RAMFC_SET_CHANNEL_INFO +at channel allocation. When generating the method, Host will ignore the upper +16 bits of the register value and populate the upper 16 bits of the method +payload with the channel ID. The register value should only change if the +channel is preempted and not loaded on a PBDMA. + The VEID field is used to specify the Virtual Engine ID (VEID) for the +channel. A VEID is a collection of independent compute or graphics state which +shares execution resources and a context image. Each channel in a TSG can be +for a different VEID, any channels sharing a VEID will share WFI behavior. + The RESERVED field is reserved for Host and any value written in these +upper 16 bits by SW is ignored by Host when generating the internal method +NV_PMETHOD_SET_CHANNEL_INFO. + The SET_CHANNEL_INFO data should be set in RAMFC via the +NV_RAMFC_SET_CHANNEL_INFO entry rather than through this register. + This register is part of a GPU context's state. On a switch, the value of +this register is saved to and restored from the NV_RAMFC_SET_CHANNEL_INFO +field of the RAMFC part of the GPU context's GPU-instance block. + One of these registers exists for each of Host's PBDMA units. This +register runs on Host's internal domain clock. + + + +#define NV_PPBDMA_SET_CHANNEL_INFO(i) (0x000400fc+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_SET_CHANNEL_INFO__SIZE_1 14 /* */ + +#define NV_PPBDMA_SET_CHANNEL_INFO_VALUE 31:0 /* RW--F */ + +#define NV_PPBDMA_SET_CHANNEL_INFO_SCG_TYPE 0:0 /* */ +#define NV_PPBDMA_SET_CHANNEL_INFO_SCG_TYPE_GRAPHICS_COMPUTE0 0x00000000 /* */ +#define NV_PPBDMA_SET_CHANNEL_INFO_SCG_TYPE_COMPUTE1 0x00000001 /* */ + +#define NV_PPBDMA_SET_CHANNEL_INFO_VEID ((6-1)+8):8 /* */ + +#define NV_PPBDMA_SET_CHANNEL_INFO_RESERVED 31:16 /* */ +HCI_CTRL - Misc Additional HCE State + + HCE_CTRL is used for misc. HCE state that needs to be channel swapped +in addition to the normal CE CLASS state. +Some of the state bits are part of the MP/SP blocks' interactions with the +HCE Handling logic. + SP_AWAITS_HCEH indicates that the SP block is waiting for HCEH to finish +processing an HCE trigger method. + HCE_RENDER_DISABLED indicates that CE class rendering has been turned off. + HCE_SUBCHSW indicates that methods have been sent to HCE, and thus GR +will need to flush its caches when the next GR method in this channel +flows down to GR (indicated by interface bit). + HCE_PRIV_MODE indicates that physical launchDMA copies are allowed. + NOP_RCVD indicates that HCE logic has decoded a NOP method, and will +send the NOP to CE when permitted.(see launch_dma_rcvd description) + LAUNCH_DMA_RCVD indicates that the HCE logic has decoded a launchdma +method from MP, and it will be sent to CE when CE has returned enough +credits, and other criteria are met. + PM_TRIGGER_RCVD indicates that HCE logic has decoded a pm_trigger method +and wants to send it to CE. + SET_RENDER_ENABLE_C_RCVD indicates that HCE logic has decoded a +set_render_enable method, and is in the process of updating the render enable +state for CE. Note, this is not strictly necessary as channel state, but it +is useful for debug while the channel is loaded. + + + + +#define NV_PPBDMA_HCE_CTRL(i) (0x000400e4+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_HCE_CTRL__SIZE_1 14 /* */ +#define NV_PPBDMA_HCE_CTRL_SP_AWAITS_HCEH 0:0 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_SP_AWAITS_HCEH_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_SP_AWAITS_HCEH_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_HCE_RENDER_DISABLED 2:2 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_HCE_RENDER_DISABLED_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_HCE_RENDER_DISABLED_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_HCE_SUBCHSW 4:4 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_HCE_SUBCHSW_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_HCE_SUBCHSW_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_HCE_PRIV_MODE 5:5 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_HCE_PRIV_MODE_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_HCE_PRIV_MODE_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_LAUNCH_DMA_RCVD 16:16 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_LAUNCH_DMA_RCVD_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_LAUNCH_DMA_RCVD_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_NOP_RCVD 17:17 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_NOP_RCVD_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_NOP_RCVD_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_PM_TRIGGER_RCVD 18:18 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_PM_TRIGGER_RCVD_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_PM_TRIGGER_RCVD_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_PM_TRIGGER_END_RCVD 19:19 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_PM_TRIGGER_END_RCVD_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_PM_TRIGGER_END_RCVD_YES 0x00000001 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_SET_RENDER_ENABLE_C_RCVD 20:20 /* RW-UF */ +#define NV_PPBDMA_HCE_CTRL_SET_RENDER_ENABLE_C_RCVD_NO 0x00000000 /* RW--V */ +#define NV_PPBDMA_HCE_CTRL_SET_RENDER_ENABLE_C_RCVD_YES 0x00000001 /* RW--V */ +TIMEOUT - Timeout Period Register + + The NV_PPBDMA_TIMEOUT register contains a value used for detecting +timeouts. The timeout value is in microsecond ticks. + +The timeouts that use this value are: +GPfifo fetch timouts to FB for acks, reqs, rdats. +PBDMA connection to LB. +GPfifo processor timeouts to FB for acks, reqs, rdats. +Method processor timeouts to FB for acks, reqs, rdats. +The init value was changed to 64K us + + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on the internal-domain +clock. + + + +#define NV_PPBDMA_TIMEOUT(i) (0x0004012c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_TIMEOUT__SIZE_1 14 /* */ + +#define NV_PPBDMA_TIMEOUT_PERIOD 31:0 /* RWEUF */ +#define NV_PPBDMA_TIMEOUT_PERIOD_INIT 0x00010000 /* RWE-V */ +#define NV_PPBDMA_TIMEOUT_PERIOD_MAX 0xffffffff /* RW--V */ + +6 - INTERRUPT REGISTERS +========================= + + The interrupt registers control the interrupts for the local devices. +Interrupts are set by an event and are cleared by software. + +INTR_0 - PBDMA Unit Interrupt Register + + The NV_PPBDMA_INTR_* registers are a PBDMA unit's interrupt register. The +logical-OR of this register feeds into the NV_PFIFO_INTR_* register. If a field +in this register is PENDING, then the corresponding interrupt condition has +occurred, and software has not yet indicated to hardware that the exception has +been handled. If a field is NON_PENDING then there are no exceptions of the +corresponding type that have not been handled. Software writes RESET to one of +these fields to indicate that a pending interrupt has been handled. + Software cannot set bits in this register. Attempting to write a bit to a +one actually clears the interrupt source. In this way, software can clear +individual bits in this register. When software recognizes an interrupt, and +services it, it can then clear the individual source by writing that single bit +in this register to RESET. Then it can read the register and see if all bits +are clear. If not, it can service other interrupts in this reg. This is +especially important since some of these bits are asynchronous to others in this +register. While an interrupt service routine (ISR) is clearing an interrupt, +other interrupts may occur. + Interrupts differ in severity. Some interrupts (like software interrupts) +are expected in the normal operation of of the GPU, and do not indicate that any +GPU context has been damaged, or hung. Some interrupts (like timeouts) do not +indicate damage, but indicate that deadlock might have occured. Some interrupts +indicate that an error has occured that might have damaged a GPU context, but +has not damaged any of the others. Finally some interrupts indicate that any +or all of the active GPU contexts have been damaged. + This register is for interrupts that cause a PBDMA unit to stall +(non-stalling non-switching interrupts are stored on a per-channel bias) Bits in +this register being set to PENDING will prevent the contents of the PBDMA unit +from being switched out. Until software handles these interrupts and writes the +bits to RESET, the PBDMA will be frozen. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on Host's internal domain +clock. This register is new for Fermi. + +Interrupt field summary for INTR_0, INTR_EN_0, and INTR_STALL: + + + +#define NV_PPBDMA_INTR_0(i) (0x00040108+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_INTR_0__SIZE_1 14 /* */ + + The NV_PPBDMA_INTR_*_MEMREQ field indicates that a memory request was not +accepted within NV_PPBDMA_TIMEOUT_PERIOD. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMREQ 0:0 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMREQ_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMREQ_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMREQ_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_MEMACK_TIMEOUT field indicates that a PBDMA unit +has not received a MMU acknowledge within NV_PPBDMA_TIMEOUT_PERIOD. This is an +unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMACK_TIMEOUT 1:1 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMACK_TIMEOUT_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMACK_TIMEOUT_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMACK_TIMEOUT_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_MEMACK_EXTRA field indicates thatr a PBDMA unit +received more MMU acknowledges than it was expecting, or received an +acknowledge with an unexpected subidentifer. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMACK_EXTRA 2:2 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMACK_EXTRA_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMACK_EXTRA_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMACK_EXTRA_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_MEMDAT_TIMEOUT field indicates that read data was +not received within NV_PPBDMA_TIMEOUT_PERIOD. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMDAT_TIMEOUT 3:3 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMDAT_TIMEOUT_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMDAT_TIMEOUT_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMDAT_TIMEOUT_RESET 0x00000001 /* -W--C */ + + NV_PPBDMA_INTR_*_MEMDAT_EXTRA field indicates that a PBDMA unit received +more data than it requested, or received read data with an unexpected +sub-identifier. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMDAT_EXTRA 4:4 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMDAT_EXTRA_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMDAT_EXTRA_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMDAT_EXTRA_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_MEMFLUSH field indicates a PBDMA unit issued a FB +flush request due to a NV_UDMA_FB_FLUSH method, and did not receive a flush +acknowledge within NV_PPBDMA_TIMEOUT_PERIOD. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMFLUSH 5:5 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMFLUSH_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMFLUSH_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMFLUSH_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_MEM_OP field indicates that a PBDMA unit issued a +memory request due to a NV_UDMA_MEM_OP_D method, and did not receive an +acknowledge within NV_PPBDMA_TIMEOUT_PERIOD. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_MEMOP 6:6 /* RWIUF */ +#define NV_PPBDMA_INTR_0_MEMOP_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_MEMOP_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_MEMOP_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_LBCONNECT field indicates that a request to connect to +a Latency Buffer was not acknowledged within NV_PPBDMA_TIMEOUT_PERIOD. This +is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_LBCONNECT 7:7 /* RWIUF */ +#define NV_PPBDMA_INTR_0_LBCONNECT_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_LBCONNECT_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_LBCONNECT_RESET 0x00000001 /* -W--C */ + + + The NV_PPBDMA_INTR_*_LBACK_TIMEOUT field indicates that a PBDMA unit did +not receive an acknowledge to a memory request within NV_PPBDMA_TIMEOUT_PERIOD. +This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_LBACK_TIMEOUT 9:9 /* RWIUF */ +#define NV_PPBDMA_INTR_0_LBACK_TIMEOUT_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_LBACK_TIMEOUT_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_LBACK_TIMEOUT_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_LBACK_EXTRA field indicates that a PBDMA received +more acknowledges from the Latency Buffer than it was expected, or that it +received more acknowledges than it was expecting. This is an unrecoverable +error. + +#define NV_PPBDMA_INTR_0_LBACK_EXTRA 10:10 /* RWIUF */ +#define NV_PPBDMA_INTR_0_LBACK_EXTRA_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_LBACK_EXTRA_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_LBACK_EXTRA_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_0_LBDAT_TIMEOUT field indicates that a PBDMA has not +received read data for a request within NV_PPBDMA_TIMEOUT_PERIOD. This is an +unrecoverable error. + +#define NV_PPBDMA_INTR_0_LBDAT_TIMEOUT 11:11 /* RWIUF */ +#define NV_PPBDMA_INTR_0_LBDAT_TIMEOUT_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_LBDAT_TIMEOUT_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_LBDAT_TIMEOUT_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_LBDAT_EXTRA field indicates that a PBDMA receive +more data from the Latency Buffer than expected, or has received read data +with an unexpected sub-identifier. This is an unrecoverable error. + +#define NV_PPBDMA_INTR_0_LBDAT_EXTRA 12:12 /* RWIUF */ +#define NV_PPBDMA_INTR_0_LBDAT_EXTRA_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_LBDAT_EXTRA_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_LBDAT_EXTRA_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_GPFIFO field indicates that a PBDMA unit encountered +an invalid GPFIFO (circular buffer of GP-Entries). A GPFIFO that crosses the +end of the memory address space (0xFFFFFFFFFF) is invalid. The invalid value +will be in NV_PPBDMA_GP_BASE and NV_PPBDMA_GP_BASE_HI. Fixing this and clearing +the interrupt will allow the PBDMA unit to continue. The error is limited to +the channel. + +#define NV_PPBDMA_INTR_0_GPFIFO 13:13 /* RWIUF */ +#define NV_PPBDMA_INTR_0_GPFIFO_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_GPFIFO_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_GPFIFO_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_GPPTR field indicated that a PBDMA unit encountered +invalid GP pointers (either NV_PPBDMA_GP_PUT, NV_PPBDMA_GP_FETCH, or +NV_PPBDMA_GP_GET). These pointers are invalid if they are not between zero and +one less than the size of the circular buffer that contains GP entries: +1<<NV_PPBDMA_GP_BASE_HI_LIMIT2. Fixing the invalid pointer and clearing the +interrupt will allow the PBDMA unit to continue. The error is limited to the +channel. + +#define NV_PPBDMA_INTR_0_GPPTR 14:14 /* RWIUF */ +#define NV_PPBDMA_INTR_0_GPPTR_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_GPPTR_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_GPPTR_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_GPENTRY field indicates that a PBDMA unit encountered +an invalid GP entry. The invalid entry will be in NV_PPBDMA_GP_SHADOW_*. +Invalid GP entries are treated like traps, they will set the interrupt and +freeze the PBDMA, but the invalid entry is discarded. Once the interrupt is +cleared, the PBDMA unit will simply continue with the next GP entry. The +GP_CRC is not updated by the discarded entry. Important: Graceful interrupt +recovery is only possible if a GP entry with a length of ZERO caused this +interrupt. For NON-ZERO length GP entries, this interrupt is fatal. The error +is limited to the channel. + +#define NV_PPBDMA_INTR_0_GPENTRY 15:15 /* RWIUF */ +#define NV_PPBDMA_INTR_0_GPENTRY_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_GPENTRY_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_GPENTRY_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_GPCRC field indicates that the cyclic redundancy check +value measured over GP entries did not match the expected value. This interrupt +is for debug, and indicates that the memory subsystem returned corrupted data on +previous GP fetches. The NV_PPBDMA_GP_CRC register is cleared independent of +the comparison succeeding, so clearing the interrupt will continue as if the CRC +had passed. The error is limited to the channel. + +#define NV_PPBDMA_INTR_0_GPCRC 16:16 /* RWIUF */ +#define NV_PPBDMA_INTR_0_GPCRC_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_GPCRC_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_GPCRC_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_PBPTR field indicates that a PBDMA unit encountered an +invalid PB pointer. NV_PPBDMA_GET is invalid if it is not less than +NV_PPBDMA_PUT. Fixing the invalid pointer and clearing the interrupt will allow +the PBDMA unit to continue. The error is limited to the channel. + +#define NV_PPBDMA_INTR_0_PBPTR 17:17 /* RWIUF */ +#define NV_PPBDMA_INTR_0_PBPTR_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_PBPTR_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_PBPTR_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_PBENTRY field indicates that a PBDMA unit has +encountered an invalid PB entry. This can occur when Host expects the PB entry +to be a PB instruction, and any of the following happen: + + * The PB entry does not decode properly into a PB instruction. + * The decoded instruction is in an obsolete format or is otherwise not + valid (see "FIFO_DMA" in dev_ram.ref). + * The decoded instruction is either an incrementing method header or an + increment-once method header, and the header's COUNT field would cause + the method addresses for the generated method sequence to exceed the + maximum method address, thus the method addresses would wrap. + + The expected recovery procedure for handling a PBENTRY interrupt is +described below: + + 1. In order to determine the cause of a PBENTRY interrupt while an error is + pending: + 1a. Examine the NV_PPBDMA_HDR_SHADOW register for proper encoding. + This register contains the raw PB entry that triggered the PBENTRY + interrupt. If its contents are not properly encoded then this was + the cause of the interrupt. + 1b. If the raw PB entry is properly encoded then the PB header is + invalid for some other reason. This means the PB entry was + decoded before the PBENTRY interrupt was triggered, and the + NV_PPBDMA_PB_HEADER register will contain the decoded PB entry. + 2. Regardless of the cause of the PBENTRY interrupt, one must update the + NV_PPBDMA_PB_HEADER register to contain a valid header. + 3. If the valid updated header is a PB method header, then the VALUE field + of the NV_PPBDMA_PB_COUNT register must also be updated to reflect the + number of subsequent PB entries to interpret as method data (note that + the other fields of PB_COUNT should be left alone; this requires a + read-modify-write of this register). If this value is incorrect, then + the pushbuffer decoding will become out of sync between headers and + data. Note that when decoding PB method headers normally, the HW sets + NV_PPBDMA_PB_COUNT_VALUE to the NV_FIFO_DMA_METHOD_COUNT field value of + the raw PB entry. + 4. For consistency, NV_PPBDMA_HDR_SHADOW should be fixed too, but that is + not required for proper HW operation (the HW ignores + NV_PPBDMA_HDR_SHADOW). + 5. Clear the PBENTRY interrupt after fixing the state to allow the PBDMA + unit to continue. + + The PBENTRY error is limited to the channel. Note that while a PBENTRY +interrupt is pending on a given channel, one cannot assume that any +method/address pair generated from the preceding PB entries on that channel has +executed yet (the PB entries themselves are processed in order, but this +processing consists only executing PB control entries and generating the +method/address pairs from the PB method headers and PB method data dwords; see +dev_ram.ref for the difference between control entries and methods). + +#define NV_PPBDMA_INTR_0_PBENTRY 18:18 /* RWIUF */ +#define NV_PPBDMA_INTR_0_PBENTRY_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_PBENTRY_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_PBENTRY_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_PBCRC field indicates that the cyclic redundancy check +value measured over a PB segment did not match the expected value. This +interrupt is for debug, and indicates that the memory subsystem returned +corrupted data on previous PB fetches. The NV_PPBDMA_PB_CRC register is cleared +at the start of each new segment, independent of the comparison succeeding, so +clearing the interrupt will continue as if the CRC had passed. The error is +limited to the channel. + +#define NV_PPBDMA_INTR_0_PBCRC 19:19 /* RWIUF */ +#define NV_PPBDMA_INTR_0_PBCRC_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_PBCRC_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_PBCRC_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_CLEAR_FAULTED_ERROR field indicates that a PBDMA unit +encountered a Host CLEAR_FAULTED method and the target FAULT bit for the target +chid specified in the method payload was not set within the +NV_PFIFO_CLEAR_FAULTED_TIMEOUT_PERIOD. This is intended to catch SW errors +where a CLEAR_FAULT method targets the wrong channel or a channel that has +already had its fault cleared. Please refer to the description of the +NV_UDMA_CLEAR_FAULTED method in section 9 (HOST METHODS) for details. + + When PENDING, the PBDMA is stalled and remains loaded on the channel. The +address of the invalid method will be in NV_PPBDMA_METHOD0, and its data will be +in NV_PPBDMA_DATA0. Fixing the invalid method in NV_PPBDMA_METHOD0 (or changing +it to NV_UDMA_NOP) and clearing the interrupt will allow the PBDMA unit to +continue. The error is limited to the channel. + +#define NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR 20:20 /* RWIUF */ +#define NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_METHOD field indicates that a PBDMA unit encountered +a method that could not be processed for one of the following reasons: + * The method is an internal method; that is, its address is in the + NV_PMETHOD range (see internal_methods.ref) + * The method address is not in the range of engine methods, but it is not a + valid Host method either + * The method is NV_UDMA_ILLEGAL + * The method attempted to perform a privileged operation, but + NV_PPBDMA_CONFIG_AUTH_LEVEL is NON_PRIVILEGED + * An NV_UDMA_YIELD method with an unknown OP is encountered + * A Host SYNCPOINT method is encountered. Syncpoints are only supported on + Tegra parts. + + The address of the invalid method will be in NV_PPBDMA_METHOD0, and its +data will be in NV_PPBDMA_DATA0. Fixing the invalid method in +NV_PPBDMA_METHOD0 (or changing it to NV_UDMA_NOP) and clearing the interrupt +will allow the PBDMA unit to continue. The error is limited to the channel. + +#define NV_PPBDMA_INTR_0_METHOD 21:21 /* RWIUF */ +#define NV_PPBDMA_INTR_0_METHOD_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_METHOD_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_METHOD_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_METHODCRC field indicates that the cyclic redundancy +check value measured over methods sent to Host's crossbar did not match the +expected value. This interrupt is for debug, and indicates that the PBDMA unit +sent incorrect methods to the engine. There is no use continuing with the +corrupted method stream, but for debug purposes execution may continue if the +crc from the NV_UDMA_CRC_CHECK method (from NV_PPBDMA_DATA0) is copied over the +NV_PPBDMA_METHOD_CRC register before clearing the interrupt. The error is +limited to the channel. + +#define NV_PPBDMA_INTR_0_METHODCRC 22:22 /* RWIUF */ +#define NV_PPBDMA_INTR_0_METHODCRC_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_METHODCRC_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_METHODCRC_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_DEVICE field indicates a SW-class method. More +specifically, it indicates that the method's subchannel specified a SW engine or +a non-existent engine. Note the subchannel-to-engine mapping is fixed, and that +it is not possible to specify a non-existent engine--see NV_UDMA_OBJECT. The +method information is in NV_PPBDMA_METHOD0 and NV_PPBDMA_DATA0. For a software +method, METHOD0_SUBCH will be 5, 6, or 7. After handling the SW-class method, SW +should clear the METHOD0_VALID field to FALSE or replace the method ADDR with +NV_UDMA_NOP. Consecutive SW-class methods in the method FIFO +(NV_PPBDMA_{METHOD,DATA}{1,2,3}) may also be handled and replaced with NOPs or +their VALID fields cleared up to the first non-SW method. + +#define NV_PPBDMA_INTR_0_DEVICE 23:23 /* RWIUF */ +#define NV_PPBDMA_INTR_0_DEVICE_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_DEVICE_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_DEVICE_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_ENG_RESET field indicates that an engine was reset +while the PBDMA unit was processing a channel from a runlist which serves the +engine. The interrupt is not triggered if PBDMA is in halted state while the +engine is reset. However, If the engine remains in reset, when the PBDMA continues, +the interrupt will be fired. This is a potentially fatal condition for the +channel which was loaded on the PBDMA while the engine was reset. The PBDMA which +encountered the interrupt will stall and prevent the channel which was loaded at +the time the interrupt fired from being swapped out until the interrupt is cleared. +To unblock the PBDMA, SW needs to do the following: + + 1. Disable all the channels in the TSG + 2. Initiate a preempt (but do not poll for completion yet) + 3. Clear the interrupt bit + 4. Poll for preempt completion + 5. Tear down the context + +Note the TSG ID can be obtained by reading NV_PFIFO_PBDMA_STATUS_ID; +see dev_fifo.ref. The error is limited to the channel. + +#define NV_PPBDMA_INTR_0_ENG_RESET 24:24 /* RWIUF */ +#define NV_PPBDMA_INTR_0_ENG_RESET_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_ENG_RESET_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_ENG_RESET_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_SEMAPHORE field indicates that a PBDMA unit has +encountered a NV_UDMA_SEM_EXECUTE method whose data field (which indicates the +details of the semaphore operation) is invalid. The method will be in +NV_PPBDMA_METHOD0. The method data is in both NV_PPBDMA_DATA0 and +NV_UDMA_SEM_EXECUTE. Any changes to NV_PPBDMA_METHOD0 or NV_PPBDMA_DATA0 should +also be reflected consistently in NV_PPBDMA_SEM_EXECUTE. After fixing the +method and/or data, clearing the interrupt will allow the PBDMA unit to +continue. The error is limited to the channel. + +#define NV_PPBDMA_INTR_0_SEMAPHORE 25:25 /* RWIUF */ +#define NV_PPBDMA_INTR_0_SEMAPHORE_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_SEMAPHORE_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_SEMAPHORE_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_ACQUIRE field indicates that a semaphore acquire did +not occur within the maximum period (as specified by the +NV_PPBDMA_ACQUIRE_TIMEOUT register). The method will be in NV_PPBDMA_METHOD0. +The method data is in both NV_PPBDMA_DATA0 and NV_PPBDMA_SEM_EXECUTE. Any +changes to NV_PPBDMA_METHOD0 or NV_PPBDMA_DATA0 should also be reflected +consistently in NV_PPBDMA_SEM_EXECUTE. Because the timeout counter is not +automatically reset after an acquire failure, clearing the interrupt may result +in a subsequent ACQUIRE timeout on the next acquire attempt. To prevent this, +one should choose one of the following cleanup options before clearing the +interrupt: +1 - Preempt/unbind the channel +2 - NOP the semaphore method +3 - Release the semaphore +4 - Clear the SEM_EXECUTE_ACQUIRE_FAIL bit to restart the counter. +After fixing the method and/or data, clearing the +interrupt will allow the PBDMA unit to continue. The error is limited to the +channel. + +#define NV_PPBDMA_INTR_0_ACQUIRE 26:26 /* RWIUF */ +#define NV_PPBDMA_INTR_0_ACQUIRE_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_ACQUIRE_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_ACQUIRE_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_PRI field indicates that a PRI write access to a +register occurred while a valid channel is loaded on PBDMA and the PBDMA is not +IDLE or frozen for an interrupt. This interrupt will occur only if the PRI access +will cause the PBDMA unit to operate incorrectly. Clearing the interrupt will +allow the PBDMA unit to continue, however the PBDMA state will be corrupted. +Depending on the register, this may be an unrecoverable error, or may be limited +to the channel. + +#define NV_PPBDMA_INTR_0_PRI 27:27 /* RWIUF */ +#define NV_PPBDMA_INTR_0_PRI_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_PRI_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_PRI_RESET 0x00000001 /* -W--C */ + + + + + The NV_PPBDMA_INTR_*_PBSEG field indicates that a PBDMA unit encountered a +PB compressed method sequence that begins in a non-conditionally fetched PB +segment and ends in a conditionally-fetched PB segment. That is, the first valid +PB entry of a conditionally-fetched PB segment is interpreted as method data. +This is likely to corrupt the pushbuffer data stream. Clearing the interrupt will +allow the PBDMA unit to continue. The error is limited to the channel. + +Note: Although the PBDMA will continue after the interrupt is cleared, it might +have a faulty method stream after this interrupt. This is generally fatal to the +context and an RC will be needed. + +#define NV_PPBDMA_INTR_0_PBSEG 30:30 /* RWIUF */ +#define NV_PPBDMA_INTR_0_PBSEG_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_PBSEG_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_PBSEG_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_SIGNATURE field indicates that an invalid Host class +ID was specified in NV_RAMFC_SIGNATURE when a channel's RAMFC was loaded. This +usually indicates SW is attempting to use the wrong Host class for the current +chip. The invalid value will be in NV_PPBDMA_SIGNATURE_HW. Fixing the invalid +value and clearing the interrupt will allow the PBDMA unit to continue. The +error is limited to the channel. Note that attempting to use methods from a +mismatched Host class may or may not work depending on the method, but will not +necessarily cause an interrupt. + +#define NV_PPBDMA_INTR_0_SIGNATURE 31:31 /* RWIUF */ +#define NV_PPBDMA_INTR_0_SIGNATURE_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_0_SIGNATURE_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_0_SIGNATURE_RESET 0x00000001 /* -W--C */ + + +INTR_1 is a continuation of INTR_0. +Added for Kepler to handle HCE interrupts. +Interrupts related to HCE occupy the least significant bits of the register and +any new HCE interrupt should be added to the available least significant bit. +New non-HCE PBDMA interrupts should be added the available most significant bit +of the register. If a new class of interrupts need to be added, they can be +added from bit 8 or 16. + + +#define NV_PPBDMA_INTR_1(i) (0x00040148+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_INTR_1__SIZE_1 14 /* */ + + The INTR_*_HCE_ILLEGAL_OP field indicates that a PBDMA encountered +a render enable method with an invalid render enable operation. +The sent invalid op can be found in the pbdma's NV_PPBDMA_HCE_DBG1_MTHD_DATA +register. + +#define NV_PPBDMA_INTR_1_HCE_RE_ILLEGAL_OP 0:0 /* RWIUF */ +#define NV_PPBDMA_INTR_1_HCE_RE_ILLEGAL_OP_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_1_HCE_RE_ILLEGAL_OP_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_1_HCE_RE_ILLEGAL_OP_RESET 0x00000001 /* -W--C */ + + The INTR_*_HCE_RE_ALIGNB field indicates that a PBDMA unit encountered +a Set_Render_Enable_C Copy Engine Class method while the Render_Enable_B value +was not aligned. +This is effectively a CE Launch Check. +This error is limited to the channel. + +#define NV_PPBDMA_INTR_1_HCE_RE_ALIGNB 1:1 /* RWIUF */ +#define NV_PPBDMA_INTR_1_HCE_RE_ALIGNB_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_1_HCE_RE_ALIGNB_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_1_HCE_RE_ALIGNB_RESET 0x00000001 /* -W--C */ + + The INTR_*_HCE_PRIV field indicates that a PBDMA unit encountered +a LaunchDMA Copy Engine Class method setup to access the physical memory aperature, +but the PRIV_MODE bit in the RAMFC for the loaded channel was NOT set. +This is effectively a CE Launch Check. +This error is limited to the channel. + +#define NV_PPBDMA_INTR_1_HCE_PRIV 2:2 /* RWIUF */ +#define NV_PPBDMA_INTR_1_HCE_PRIV_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_1_HCE_PRIV_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_1_HCE_PRIV_RESET 0x00000001 /* -W--C */ + + The INTR_*_HCE_ILLEGAL_MTHD field indicates that a PBDMA encountered +a method bound for CE that is not decoded in the CE CLASS. +The method and its data that triggered the error can be found in the pbdma's +NV_PPBDMA_HCE_DBG0_MTHD_ADDR and NV_PPBDMA_HCE_DBG1_MTHD_DATA registers. + +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_MTHD 3:3 /* RWIUF */ +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_MTHD_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_MTHD_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_MTHD_RESET 0x00000001 /* -W--C */ + + The INTR_*_HCE_ILLEGAL_CLASS field indicates that a PBDMA encountered +a SetObject method that specifies an unrecognized class ID. +The sent illegal class ID can be found in NV_PPBDMA_HCE_DBG1_MTHD_DATA. + +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_CLASS 4:4 /* RWIUF */ +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_CLASS_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_CLASS_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_1_HCE_ILLEGAL_CLASS_RESET 0x00000001 /* -W--C */ + + The NV_PPBDMA_INTR_*_CTXNOTVALID field indicates error conditions related +to the NV_PPBDMA_TARGET_*_CTX_VALID fields for a channel. The following +conditions trigger the interrupt: + + * The PBDMA unit encountered an engine method or SetObject but the + corresponding CTX_VALID bit for the targeted engine is FALSE, or + * At channel start/resume, all preemptable engines have CTX_VALID FALSE but: + - CTX_RELOAD is set in NV_PCCSR_CHANNEL_STATUS, + - NV_PPBDMA_TARGET_SHOULD_SEND_HOST_TSG_EVENT is TRUE, or + - NV_PPBDMA_TARGET_NEEDS_HOST_TSG_EVENT is TRUE + +The PBDMA which encountered the interrupt will stall and prevent the channel +which was loaded at the time the interrupt fired from being swapped out until +the interrupt is cleared. The field is left NOT_PENDING and the interrupt is +not raised if the PBDMA is currently halted. This allows SW to unblock the +PBDMA and recover via the below procedure. SW may read METHOD0, CHANNEL_STATUS, +and TARGET to determine whether the interrupt was due to an engine method, +CTX_RELOAD, SHOULD_SEND_HOST_TSG_EVENT, or NEEDS_HOST_TSG_EVENT. If METHOD0 +VALID is TRUE, lazy context creation can be used or the TSG may be destroyed. +If METHOD0 VALID is FALSE, the error is likely a bug in SW, and the TSG +will have to be destroyed. + +Recovery procedure: + + 1. Determine which CHID and TSG hit the interrupt, and read NV_PPBDMA_METHOD0, + NV_PCCSR_CHANNEL_STATUS, and NV_PPBDMA_TARGET to find out whether the + interrupt was due to an engine method or not. + 2. Disable all channels in the containing TSG by writing ENABLE_CLR to TRUE + in their channel RAM entries in NV_PCCSR_CHANNEL (see dev_fifo.ref). + 3. Initiate a preempt of the TSG via NV_PFIFO_PREEMPT or + NV_PFIFO_RUNLIST_PREEMPT. This must be done prior to clearing the + interrupt or it will just fire again. + 4. Set the channel's relevant NV_PPBDMA_TARGET_*_CTX_VALID bit to TRUE + by writing the PRI register directly. Even though no context is valid, + this is required to allow the interrupt to be cleared. This must be + done prior to the interrupt even if SW intends to create a context on + the fly via step 7c. + 5. Clear the interrupt by writing CTXNOTVALID_RESET to NV_PPBDMA_INTR_1. + 6. Poll for the preempt to complete. Note: If other interrupts have fired, + those must be cleared as well before the preempt will complete. + The preempt must finish before any channel or context is torn down. + 7. Destroy the TSG, or dynamically allocate the engine context as follows: + 7a. Allocate an engine context + 7b. Add its pointer to NV_RAMIN and set up NV_PRAMIN (dev_ram.ref) + for all channels in the TSG + 7c. Set the relevant CTX_VALID to TRUE in NV_RAMFC_TARGET for all + channels in the TSG + 7d. Re-enable the channels by writing ENABLE_SET_TRUE to each + NV_PCCSR_CHANNEL in the TSG + +Alternatively, SCHED_DISABLE can be used in lieu of disabling the TSG channels. +The error is limited to the channel. + Warning: If NV_PPBDMA_INTR_STALL_1_CTXNOTVALID is DISABLED, this error is +non-recoverable. + +#define NV_PPBDMA_INTR_1_CTXNOTVALID 31:31 /* RWIUF */ +#define NV_PPBDMA_INTR_1_CTXNOTVALID_NOT_PENDING 0x00000000 /* R-I-V */ +#define NV_PPBDMA_INTR_1_CTXNOTVALID_PENDING 0x00000001 /* R---V */ +#define NV_PPBDMA_INTR_1_CTXNOTVALID_RESET 0x00000001 /* -W--C */ + + + +INTR_EN_0 - PBDMA-Unit Interrupt Enable Register + + The NV_PPBDMA_INTR_EN_0 register controls which PBDMA interrupt conditions +are enabled. If a field is DISABLED, then the corresponding interrupt in +NV_PPBDMA_INTR_0 is disabled. If a field is ENABLED, then the corresponding +interrupt in NV_PPBDMA_INTR_0 is enabled. + The masking of interrupts by this register is done after the +NV_PPBDMA_INTR_0 register. This register stops interrupts from being reported, +it does not stop bits in the NV_PPBDMA_INTR_0 from being set. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on the internal-domain +clock. + + +#define NV_PPBDMA_INTR_EN_0(i) (0x0004010c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_INTR_EN_0__SIZE_1 14 /* */ + +#define NV_PPBDMA_INTR_EN_0_MEMREQ 0:0 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMREQ_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMREQ_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_MEMACK_TIMEOUT 1:1 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMACK_TIMEOUT_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMACK_TIMEOUT_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_MEMACK_EXTRA 2:2 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMACK_EXTRA_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMACK_EXTRA_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_MEMDAT_TIMEOUT 3:3 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMDAT_TIMEOUT_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMDAT_TIMEOUT_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_MEMDAT_EXTRA 4:4 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMDAT_EXTRA_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMDAT_EXTRA_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_MEMFLUSH 5:5 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMFLUSH_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMFLUSH_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_MEMOP 6:6 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_MEMOP_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_MEMOP_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_LBCONNECT 7:7 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_LBCONNECT_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_LBCONNECT_ENABLED 0x00000001 /* RW--V */ + + +#define NV_PPBDMA_INTR_EN_0_LBACK_TIMEOUT 9:9 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_LBACK_TIMEOUT_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_LBACK_TIMEOUT_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_LBACK_EXTRA 10:10 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_LBACK_EXTRA_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_LBACK_EXTRA_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_LBDAT_TIMEOUT 11:11 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_LBDAT_TIMEOUT_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_LBDAT_TIMEOUT_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_LBDAT_EXTRA 12:12 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_LBDAT_EXTRA_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_LBDAT_EXTRA_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_GPFIFO 13:13 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_GPFIFO_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_GPFIFO_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_GPPTR 14:14 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_GPPTR_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_GPPTR_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_GPENTRY 15:15 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_GPENTRY_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_GPENTRY_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_GPCRC 16:16 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_GPCRC_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_GPCRC_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_PBPTR 17:17 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_PBPTR_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_PBPTR_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_PBENTRY 18:18 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_PBENTRY_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_PBENTRY_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_PBCRC 19:19 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_PBCRC_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_PBCRC_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_CLEAR_FAULTED_ERROR 20:20 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_CLEAR_FAULTED_ERROR_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_CLEAR_FAULTED_ERROR_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_METHOD 21:21 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_METHOD_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_METHOD_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_METHODCRC 22:22 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_METHODCRC_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_METHODCRC_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_DEVICE 23:23 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_DEVICE_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_DEVICE_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_ENG_RESET 24:24 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_ENG_RESET_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_ENG_RESET_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_SEMAPHORE 25:25 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_SEMAPHORE_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_SEMAPHORE_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_ACQUIRE 26:26 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_ACQUIRE_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_ACQUIRE_ENABLED 0x00000001 /* RW--V */ +#define NV_PPBDMA_INTR_EN_0_PRI 27:27 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_PRI_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_PRI_ENABLED 0x00000001 /* RW--V */ + + +#define NV_PPBDMA_INTR_EN_0_PBSEG 30:30 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_PBSEG_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_PBSEG_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_0_SIGNATURE 31:31 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_0_SIGNATURE_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_0_SIGNATURE_ENABLED 0x00000001 /* RW--V */ + +INTR_EN_1 is a continuation of INTR_EN_0. +Added for Kepler to handle HCE interrupts. + + +#define NV_PPBDMA_INTR_EN_1(i) (0x0004014c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_INTR_EN_1__SIZE_1 14 /* */ + +#define NV_PPBDMA_INTR_EN_1_HCE_RE_ILLEGAL_OP 0:0 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_1_HCE_RE_ILLEGAL_OP_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_1_HCE_RE_ILLEGAL_OP_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_1_HCE_RE_ALIGNB 1:1 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_1_HCE_RE_ALIGNB_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_1_HCE_RE_ALIGNB_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_1_HCE_PRIV 2:2 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_1_HCE_PRIV_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_1_HCE_PRIV_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_1_HCE_ILLEGAL_MTHD 3:3 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_1_HCE_ILLEGAL_MTHD_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_1_HCE_ILLEGAL_MTHD_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_1_HCE_ILLEGAL_CLASS 4:4 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_1_HCE_ILLEGAL_CLASS_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_1_HCE_ILLEGAL_CLASS_ENABLED 0x00000001 /* RW--V */ + +#define NV_PPBDMA_INTR_EN_1_CTXNOTVALID 31:31 /* RWEUF */ +#define NV_PPBDMA_INTR_EN_1_CTXNOTVALID_DISABLED 0x00000000 /* RWE-V */ +#define NV_PPBDMA_INTR_EN_1_CTXNOTVALID_ENABLED 0x00000001 /* RW--V */ + + + +INTR_STALL - PBDMA-Unit Interrupt Stall Control Register + + The NV_PPBDMA_INTR_STALL register controls whether an interrupt causes the +PBDMA unit to stop and stall. If an interrupt's field is STALL_*_ENABLED, then +the interrupt causes the PBDMA to stall. If an interrupt's field is +STALL_*_DISABLED then the interrupt does not cause the PBDMA unit to stall. + This register is intended for verification. In normal operation, the +register should be left at the default value, meaning all interrupts cause the +PBDMA unit to stall. + One of these registers exists for each of Host's PBDMA units. This +register is not context switched. This register runs on the internal-domain +clock. + + +#define NV_PPBDMA_INTR_STALL(i) (0x0004013c+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_INTR_STALL__SIZE_1 14 /* */ + +#define NV_PPBDMA_INTR_STALL_MEMREQ 0:0 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMREQ_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMREQ_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_MEMACK_TIMEOUT 1:1 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMACK_TIMEOUT_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMACK_TIMEOUT_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_MEMACK_EXTRA 2:2 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMACK_EXTRA_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMACK_EXTRA_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_MEMDAT_TIMEOUT 3:3 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMDAT_TIMEOUT_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMDAT_TIMEOUT_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_MEMDAT_EXTRA 4:4 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMDAT_EXTRA_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMDAT_EXTRA_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_MEMFLUSH 5:5 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMFLUSH_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMFLUSH_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_MEMOP 6:6 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_MEMOP_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_MEMOP_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_LBCONNECT 7:7 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_LBCONNECT_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_LBCONNECT_ENABLED 0x00000001 /* RWE-V */ + + +#define NV_PPBDMA_INTR_STALL_LBACK_TIMEOUT 9:9 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_LBACK_TIMEOUT_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_LBACK_TIMEOUT_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_LBACK_EXTRA 10:10 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_LBACK_EXTRA_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_LBACK_EXTRA_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_LBDAT_TIMEOUT 11:11 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_LBDAT_TIMEOUT_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_LBDAT_TIMEOUT_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_LBDAT_EXTRA 12:12 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_LBDAT_EXTRA_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_LBDAT_EXTRA_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_GPFIFO 13:13 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_GPFIFO_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_GPFIFO_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_GPPTR 14:14 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_GPPTR_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_GPPTR_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_GPENTRY 15:15 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_GPENTRY_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_GPENTRY_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_GPCRC 16:16 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_GPCRC_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_GPCRC_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_PBPTR 17:17 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_PBPTR_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_PBPTR_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_PBENTRY 18:18 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_PBENTRY_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_PBENTRY_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_PBCRC 19:19 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_PBCRC_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_PBCRC_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_CLEAR_FAULTED_ERROR 20:20 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_CLEAR_FAULTED_ERROR_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_CLEAR_FAULTED_ERROR_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_METHOD 21:21 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_METHOD_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_METHOD_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_METHODCRC 22:22 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_METHODCRC_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_METHODCRC_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_DEVICE 23:23 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_DEVICE_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_DEVICE_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_ENG_RESET 24:24 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_ENG_RESET_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_ENG_RESET_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_SEMAPHORE 25:25 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_SEMAPHORE_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_SEMAPHORE_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_ACQUIRE 26:26 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_ACQUIRE_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_ACQUIRE_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_PRI 27:27 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_PRI_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_PRI_ENABLED 0x00000001 /* RWE-V */ + + + +#define NV_PPBDMA_INTR_STALL_PBSEG 30:30 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_PBSEG_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_PBSEG_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_SIGNATURE 31:31 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_SIGNATURE_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_SIGNATURE_ENABLED 0x00000001 /* RWE-V */ + + +INTR_STALL_1 - PBDMA-Unit HCE Interrupt Stall Control Register + + The NV_PPBDMA_INTR_STALL_1 register controls whether an interrupt causes +the PBDMA unit to stop and stall on HCE interrupts. All HCE interrupts +that are reported by the PBDMA are launch check interrupts and are immediately +dropped when encountered. Host will latch the last interrupting method and data +in HCE_DBG0 and HCE_DBG1. If stalling is ENABLED here, an interrupt will stall +the pbdma regardless of whether the interrupt is enabled or not via INTR_EN_1. + Warning: Do not disable stalling for CTXNOTVALID. Doing so will cause +undefined behavior if the interrupt condition occurs. + + +#define NV_PPBDMA_INTR_STALL_1(i) (0x00040140+(i)*8192) /* RW-4A */ +#define NV_PPBDMA_INTR_STALL_1__SIZE_1 14 /* */ + +#define NV_PPBDMA_INTR_STALL_1_HCE_RE_ILLEGAL_OP 0:0 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_1_HCE_RE_ILLEGAL_OP_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_1_HCE_RE_ILLEGAL_OP_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_1_HCE_RE_ALIGNB 1:1 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_1_HCE_RE_ALIGNB_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_1_HCE_RE_ALIGNB_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_1_HCE_PRIV 2:2 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_1_HCE_PRIV_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_1_HCE_PRIV_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_1_HCE_ILLEGAL_MTHD 3:3 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_1_HCE_ILLEGAL_MTHD_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_1_HCE_ILLEGAL_MTHD_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_1_HCE_ILLEGAL_CLASS 4:4 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_1_HCE_ILLEGAL_CLASS_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_1_HCE_ILLEGAL_CLASS_ENABLED 0x00000001 /* RWE-V */ + +#define NV_PPBDMA_INTR_STALL_1_CTXNOTVALID 31:31 /* RWEUF */ +#define NV_PPBDMA_INTR_STALL_1_CTXNOTVALID_DISABLED 0x00000000 /* RW--V */ +#define NV_PPBDMA_INTR_STALL_1_CTXNOTVALID_ENABLED 0x00000001 /* RWE-V */ + + +HCE_DBG0 - Last HCE Method Address + + HCE_DBG0 Stores the method address seen by the HCE Handler that caused +an HCE interrupt (PBDMA_INTR_1). + Only valid to read when a PBDMA_INTR_1 register has an interrupt +pending and the PBDMA_STALL_1 register is set for the corresponding +interrupt. Without the stall bit, Host will continue to process +methods, so other methods might trigger interrupts. Consequently, +the contents of this register may be unpredictable. + + +#define NV_PPBDMA_HCE_DBG0(i) (0x00040150+(i)*8192) /* R--4A */ +#define NV_PPBDMA_HCE_DBG0__SIZE_1 14 /* */ + +#define NV_PPBDMA_HCE_DBG0_MTHD_ADDR 13:2 /* R-EUF */ +#define NV_PPBDMA_HCE_DBG0_MTHD_ADDR_VAL0 0x00000000 /* R-E-V */ + +HCE_DBG0 - Last HCE Method Data + + HCE_DBG1 Stores the method data seen by the HCE Handler that caused +and HCE interrupt (PBDMA_INTR_1). + Only valid to read when a PBDMA_INTR_1 register has an interrupt +pending and the PBDMA_STALL_1 register is set for the corresponding interrupt. + + +#define NV_PPBDMA_HCE_DBG1(i) (0x00040154+(i)*8192) /* R--4A */ +#define NV_PPBDMA_HCE_DBG1__SIZE_1 14 /* */ + +#define NV_PPBDMA_HCE_DBG1_MTHD_DATA 31:0 /* R-EUF */ +#define NV_PPBDMA_HCE_DBG1_MTHD_DATA_VAL0 0x00000000 /* R-E-V */ + + +9 - HOST METHODS (NV_UDMA) +============================ + + This section describes the types of methods that are executed by Host. In +DMA mode, Host reads the pushbuffer data and generates method address/data pairs +from that data. + Terminology: +Host method - the methods listed here, below (left-shifted) address 0x100 +Host-only method - any Host method excluding SetObject, which also sends the + method to the engine specified by the subchannel field +non-Host method - engine or SW method; excludes SetObject + + +OBJECT [method] (SetObject) - Assign Object to Engine via Subchannel Method + + The NV_UDMA_OBJECT method, generally known as SetObject, SET_OBJECT, or +occasionally SetObj, verifies the engine targeted by the method's subchannel +field supports the specified class ID. + The NVCLASS field specifies the object's class identifier. The target +engine for the check is determined by the NV_FIFO_DMA_*_SUBCHANNEL specified in +the method header. See dev_ram.ref for specifics regarding the subchannel +mapping and for information regarding subchannel switching. On copy engines, +Host ensures that the object specified in NVCLASS is supported on the HCE, and +will raise INTR_*_HCE_ILLEGAL_CLASS if it is not; a CE SetObject is otherwise a +no-op and is not sent to the copy engine. This method is not used to verify the +chip's Host class ID. Use the NV_PPBDMA_SIGNATURE_HW field in +NV_RAMFC_SIGNATURE for that. + SetObject is a misnomer: the GPU provides no mechanism for SW to select any +other class interface than the one a given chip supports. SetObject is not +required by any engine. + No subchannel-object mapping is stored in Host. Each engine is +responsible for maintaining its class identifier state if multiple classes are +supported. In a TSG on a runlist targeting such a hypothetical engine, the +SetObject method need only be sent once for a given subchannel on the engine +because all channels in the TSG share a context. After the SetObject, all +channels targeting the same engine in the TSG will use the same class binding. + + + +#define NV_UDMA_OBJECT 0x00000000 /* -W-4R */ + +#define NV_UDMA_OBJECT_NVCLASS 15:0 /* -W-VF */ + + +ILLEGAL [method] - Illegal Method + + By reserving an opcode for an ILLEGAL method, triggering an error can be +guaranteed to be future-compatible. This triggers the NV_PPBDMA_INTR_0_METHOD +interrupt. This can be thought of as a software method for the channel class +XX6f, but a different interrupt bit is set (METHOD instead of DEVICE). + + +#define NV_UDMA_ILLEGAL 0x00000004 /* -W-4R */ + +#define NV_UDMA_ILLEGAL_HANDLE 31:0 /* -W-VF */ + + +NOP [method] - No Operation Method + + This method is discarded upon execution. + + + +#define NV_UDMA_NOP 0x00000008 /* -W-4R */ + +#define NV_UDMA_NOP_HANDLE 31:0 /* -W-VF */ + + + + + +Host Semaphore Methods + + NVIDIA semaphores provide a basic synchronization mechanism for the GPU. +(They do not behave like classic Dijkstra semaphores; instead, they provide a +conditional barrier.) A semaphore refers to a 4-byte or 8-byte payload value +in memory, the location of which is referred to as the semaphore address. A +semaphore release writes a payload to the semaphore or performs a reduction +operation on the semaphore using the payload. A release may optionally write a +timestamp, in which case 16 bytes are written at the semaphore address. A +semaphore acquire waits for the semaphore to reach a given condition before +allowing a channel to proceed. Five Host methods, described below, are +provided to perform semaphore releases and acquires: + + SEM_ADDR_LO - Set semaphore address least significant bits + SEM_ADDR_HI - Set semaphore address most significant bits + SEM_PAYLOAD_LO - Set the lower 32 bits of the semaphore payload + SEM_PAYLOAD_HI - Set the upper 32 bits of the semaphore payload + SEM_EXECUTE - Configure and begin execution of the release or acquire + +SEM_ADDR_LO [method] - Set Semaphore Address Low Method + + The NV_UDMA_SEM_ADDR_LO method sets the least significant bits of the +address of a semaphore. + + The NV_UDMA_SEM_ADDR_LO_OFFSET field contains bits 31:2 of a semaphore +address. Since the smallest supported semaphore is 4-byte aligned, Host will +not store bits 1:0 of the address. + Host will keep the lowest two bits of the SEM_ADDR_LO method reserved so +software can directly pack the low 32 bits of an address into the method +without needing to mask off the lowest two bits. + Note that software is required to align all semaphore addresses according +to their respective sizes, and Host enforces this requirement with the +NV_PPBDMA_INTR_0 interrupt. See the documentation below for the +NV_UDMA_SEM_EXECUTE method and its fields PAYLOAD_SIZE and RELEASE_TIMESTAMP. + While the channel is loaded on a PBDMA unit, the OFFSET value is stored in +the NV_PPBDMA_SEM_ADDR_LO register. Otherwise, this value is stored in the +NV_RAMFC_SEM_ADDR_LO field of the RAMFC part of the channel's instance block. + + +#define NV_UDMA_SEM_ADDR_LO 0x0000005C /* -W-4R */ + +#define NV_UDMA_SEM_ADDR_LO_OFFSET 31:2 /* -W-VF */ + + +SEM_ADDR_HI [method] - Set Semaphore Address High Method + + The NV_UDMA_SEM_ADDR_HI method sets the most significant bits of the +address of a semaphore. + + The NV_UDMA_SEM_ADDR_HI_OFFSET field contains bits 39:32 of the address of +a semaphore. + While the channel is loaded on a PBDMA unit, the OFFSET value is stored in +the NV_PPBDMA_SEM_ADDR_HI register. Otherwise, this value is stored in the +NV_RAMFC_SEM_ADDR_HI field of the RAMFC part of the channel's instance block. + + +#define NV_UDMA_SEM_ADDR_HI 0x00000060 /* -W-4R */ + +#define NV_UDMA_SEM_ADDR_HI_OFFSET 7:0 /* -W-VF */ + + +SEM_PAYLOAD_LO [method] - Set Semaphore Payload Low Method + + The NV_UDMA_SEM_PAYLOAD_LO method sets the lower 32 bits of the semaphore +payload. This value is used according to the NV_UDMA_SEM_EXECUTE_OPERATION +field described below. + + While the channel is loaded on a PBDMA unit, the PAYLOAD_LO value is +stored in the NV_PPBDMA_SEM_PAYLOAD_LO register. Otherwise, this value is +stored in the NV_RAMFC_SEM_PAYLOAD_LO field of the RAMFC part of the channel's +instance block. + + +#define NV_UDMA_SEM_PAYLOAD_LO 0x00000064 /* -W-4R */ + +#define NV_UDMA_SEM_PAYLOAD_LO_PAYLOAD 31:0 /* -W-VF */ + + +SEM_PAYLOAD_HI [method] - Set Semaphore Payload High Method + + The NV_UDMA_SEM_PAYLOAD_HI method sets the upper 32 bits of the semaphore +payload. This value is used according to the NV_UDMA_SEM_EXECUTE_OPERATION +field described below. + + While the channel is loaded on a PBDMA unit, the PAYLOAD_HI value is +stored in the NV_PPBDMA_SEM_PAYLOAD_HI register. Otherwise, this value is +stored in the NV_RAMFC_SEM_PAYLOAD_HI field of the channel's instance block. + + +#define NV_UDMA_SEM_PAYLOAD_HI 0x00000068 /* -W-4R */ + +#define NV_UDMA_SEM_PAYLOAD_HI_PAYLOAD 31:0 /* -W-VF */ + + +SEM_EXECUTE [method] - Semaphore Execute Method + + The NV_UDMA_SEM_EXECUTE method specifies a synchronization operation and +initiates that operation. To use a semaphore, set the semaphore's address with +the NV_UDMA_SEM_ADDR_LO/_HI methods, set the semaphore payload with +NV_UDMA_SEM_ADDR_LO/_HI methods, and then initiate the semaphore operation with +an NV_UDMA_SEM_EXECUTE method. + +Semaphore operation and payload size: + + The NV_UDMA_SEM_EXECUTE_OPERATION field specifies the semaphore operation. +RELEASE and REDUCTION cause a semaphore release to occur, potentially allowing +future acquires to succeed and causing a timestamp to be written if +RELEASE_TIMESTAMP is EN. + For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, +the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. +Timestamping should also be disabled to avoid unwanted behavior. + An operation of ACQUIRE, ACQ_STRICT_GEQ, ACQ_CIRC_GEQ, ACQ_AND, or ACQ_NOR +causes Host to perform a semaphore acquire, meaning that Host will not process +any subsequent methods in the channel until the acquire succeeds. When the +semaphore value does not satisfy the conditions of the acquire, the semahore +acquire is said to have failed. In this case, the PBDMA unit will switch to +the next pending channel on its runqueue within the same TSG, if it has not +reached the end of the runqueue, but otherwise may either start again switching +to channels on its runqueue within the same TSG or switch to another TSG; see +the documentation below for NV_UDMA_SEM_EXECSWITCH_TSG field. Upon switching +back into a channel waiting on a semaphore the PBDMA unit continues to poll the +semaphore address. When the channel is loaded on the PBDMA unit, the +NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL register field can be read for debug +purposes in order to determine whether an acquire has failed or not. + If OPERATION is ACQUIRE, the acquire succeeds when the semaphore value is +equal to the payload value. The PAYLOAD_SIZE controls the size of the memory +read performed by Host and the comparison. If PAYLOAD_SIZE is 32BIT then a 32 +bit memory read is performed and the return value is compared to PAYLOAD_LO. +If PAYLOAD_SIZE is 64BIT then a single 64 bit memory read is performed and the +return value is compared to PAYLOAD_LO/_HI. + If OPERATION is ACQ_STRICT_GEQ, the acquire succeeds when (SV >= PV), +where SV is the semaphore value in memory, PV is the payload value, and >= is +an unsigned greater-than-or-equal-to comparison. + If OPERATION is ACQ_CIRC_GEQ, the acquire succeeds when the two's +complement signed representation of the semaphore value minus the payload value +is non-negative; that is, when the semaphore value is within half a range +greater than or equal to the payload value, modulo that range. The +PAYLOAD_SIZE field determines if Host is doing a 32 bit comparison or a 64 bit +comparison. So in other words, the condition is met when the PAYLOAD_SIZE is +32BIT and the semaphore value is within the range [payload, +((payload+(2^(32-1)))-1)], modulo 2^32, or when the PAYLOAD_SIZE is 64BIT and +the semaphore value is within the range [payload, ((payload+(2^(64-1)))-1)], +modulo 2^64. + If OPERATION is ACQ_AND, the acquire succeeds when the bitwise-AND of the +semaphore value and the payload value is not zero. The PAYLOAD_SIZE field +determines if a 32 bit or 64 bit value is read from memory, and compared to. + If OPERATION is ACQ_NOR, the acquire succeeds when the bitwise-NOR of the +semaphore value and the payload value is not zero. PAYLOAD_SIZE determines if +a 32 bit or 64 bit value is read from memory, and compared to. + If OPERATION is RELEASE, then Host simply writes the payload value to the +semaphore structure in memory at the SEM_ADDR_LO/_HI address. The exact value +written depends on the operation defined. If PAYLOAD_SIZE is 32BIT then a 32 +bit payload value from PAYLOAD_LO is used. If PAYLOAD_SIZE is 64BIT then a 64 +bit payload specified by PAYLOAD_LO/_HI is used. + If OPERATION is REDUCTION, then Host sends the memory system an +instruction to perform the atomic reduction operation specified in the +REDUCTION field on the memory value, using the PAYLOAD_LO/_HI payload value as +the operand. The OPERATION_PAYLOAD_SIZE field determines if a 32 bit or 64 bit +reduction is performed. Note that if the semaphore address refers to a page +whose PTE has ATOMIC_DISABLE set, the operation will result in an +ATOMIC_VIOLATION fault; + Note that if the PAYLOAD_SIZE is 64BIT, the semaphore address is required +to be 8-byte aligned. If RELEASE_TIMESTAMP is EN while the operation is a +RELEASE or REDUCTION operation, the semaphore address is required to be 16-byte +aligned. The semaphore address is not required to be 16-byte aligned during an +acquire operation. If the semaphore address is not aligned according to the +field values Host will raise the NV_PPBDMA_INTR_0 interrupt. + For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, +the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. +Timestamping should also be disabled to avoid unwanted behavior. + +Semaphore switch option: + + The NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG field specifies whether or not +Host should switch to processing another TSG if the acquire fails. If every +channel within the same TSG has no work (is waiting on a semaphore acquire, is +idle, is unbound, or is disabled), the TSG can make no further progress until +one of the relevant semaphores is released. Because it may be a long time +before the release, it may be more efficient for the PBDMA unit to switch off +the blocked TSG prior to the runqueue timeslice expiring, so that it can serve +a different TSG that is not waiting, or so that it can poll other semaphores on +other TSGs whose channels are waiting on acquires. + When a semaphore acquire fails, the PBDMA unit will always switch to +another channel within the same TSG, provided that it has not completed a +traversal through all the TSG's channels. If every pending channel in the TSG +is waiting on a semaphore acquire, the Host scheduler is able identify a lack +of progress for the entire TSG by the time it has completed a traversal through +all those channels. In this case the value of ACQUIRE_SWITCH_TSG for each of +these channels determines whether the PBDMA will switch to another TSG or start +another traversal through the same TSG. + If ACQUIRE_SWITCH_TSG is DIS for any of the channels in the TSG, the Host +scheduler will ignore any lack of progress and continue processing the TSG, +until either every channel in the TSG runs out of work or the timeslice +expires. If ACQUIRE_SWITCH_TSG is EN for every pending channel in the TSG, the +Host scheduler will recognize a lack of progress for the whole TSG, and will +switch to the next serviceable TSG on the runqueue, if possible. + In the case described above, if there isn't a different serviceable TSG +on the runlist, then the current channel's TSG will continue to be scheduled +and the acquire retry will be naturally delayed by the time it takes for Host's +runlist processing to return to the same channel. This retry delay may be too +short, in which case the runlist search can be throttled to increase the delay +by configuring NV_PFIFO_ACQ_PRETEST; see dev_fifo.ref. Note that if the +channel remains switched in, the prefetched pushbuffer data is not discarded, +so setting ACQUIRE_SWITCH_TSG_EN cannot deterministically be depended on to +cause the discarding of prefetched pushbuffer data. + Also note that when switching between channels within a TSG, Host does not +wait on any timer (such as NV_PFIFO_ACQ_PRETEST or NV_PPBDMA_ACQUIRE_RETRY), +but is instead throttled by the time it takes to switch channels. Host will +honor the ACQUIRE_RETRY time, but only if the same channel is rescheduled +without a channel switch. + +Semaphore wait-for-idle option: + + The NV_UDMA_SEM_EXECUTE_RELEASE_WFI field applies only to releases and +reductions. It specifies whether Host should wait until the engine to which +the channel last sent methods is idle (in other words, until all previous +methods in the channel have been completed) before writing to memory as part of +the release or reduction operation. If this field is RELEASE_WFI_EN, then Host +waits for the engine to be idle, inserts a system memory barrier, and then +updates the value in memory. If this field is RELEASE_WFI_DIS, Host performs +the semaphore operation on the memory without waiting for the engine to be +idle, and without using a system memory barrier. + +Semaphore timestamp option: + + The NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP specifies whether a timestamp +should be written by a release in addition to the payload. If +RELEASE_TIMESTAMP is DIS, then only the semaphore payload will be written. If +the field is EN then both the semaphore payload and a nanosecond timestamp will +be written. In this case, the semaphore address must be 16-byte aligned; see +the related note at NV_UDMA_SEM_ADDR_LO. If RELEASE_TIMESTAMP is EN and +SEM_ADDR_LO is not 16-byte aligned, then Host will initiate an interrupt +(NV_PPBDMA_INTR_0_SEMAPHORE). When a 16-byte semaphore is written, the +semaphore timestamp will be written before the semaphore payload so that when +an acquire succeeds, the timestamp write will have completed. This ensures SW +will not get an out-of-date timestamp on platforms which guarantee ordering +within a 16-byte aligned region. The timestamp value is snapped from the +NV_PTIMER_TIME_1/0 registers; see dev_timer.ref. + For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, +the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. +Timestamping should also be disabled for a synpoint backed releast to avoid +unexpected behavior. + + Below is the little endian format of 16-byte semaphores in memory: + + ---- ------------------- ------------------- + byte Data(Little endian) Data(Little endian) + PAYLOAD_SIZE=32BIT PAYLOAD_SIZE=64BIT + ---- ------------------- ------------------- + 0 Payload[ 7: 0] Payload[ 7: 0] + 1 Payload[15: 8] Payload[15: 8] + 2 Payload[23:16] Payload[23:16] + 3 Payload[31:24] Payload[31:24] + 4 0 Payload[39:32] + 5 0 Payload[47:40] + 6 0 Payload[55:48] + 7 0 Payload[63:56] + 8 timer[ 7: 0] timer[ 7: 0] + 9 timer[15: 8] timer[15: 8] + 10 timer[23:16] timer[23:16] + 11 timer[31:24] timer[31:24] + 12 timer[39:32] timer[39:32] + 13 timer[47:40] timer[47:40] + 14 timer[55:48] timer[55:48] + 15 timer[63:56] timer[63:56] + ---- ------------------- ------------------- + + +Semaphore reduction operations: + + The NV_UDMA_SEM_EXECUTE_REDUCTION field specifies the reduction operation +to perform on the semaphore memory value, using the semaphore payload from +SEM_PAYLOAD_LO/HI as an operand, when the OPERATION field is +OPERATION_REDUCTION. Based on the PAYLOAD_SIZE field the semaphore value and +the payload are interpreted as 32bit or 64bit integers and the reduction +operation is performed according to the signedness specified via the +REDUCTION_FORMAT field described below. The reduction operation leaves the +modified value in the semaphore memory according to the operation as follows: + +REDUCTION_IMIN - the minimum of the value and payload +REDUCTION_IMAX - the maximum of the value and payload +REDUCTION_IXOR - the bitwise exclusive or (XOR) of the value and payload +REDUCTION_IAND - the bitwise AND of the value and payload +REDUCTION_IOR - bitwise OR of the value and payload +REDUCTION_IADD - the sum of the value and payload +REDUCTION_INC - the value incremented by 1, or reset to 0 if the incremented + value would exceed the payload +REDUCTION_DEC - the value decremented by 1, or reset back to the payload + if the original value is already 0 or exceeds the payload + +Note that INC and DEC are somewhat surprising: they can be used to repeatedly +loop the semaphore value when performed successively with the same payload p. +INC repeatedly iterates from 0 to p inclusive, resetting to 0 once exceeding p. +DEC repeatedly iterates down from p to 0 inclusive, resetting back to p once +the value would otherwise underflow. Therefore, an INC or DEC reduction with +payload 0 effectively releases a semaphore by setting its value to 0. + +The reduction opcode assignment matches the enumeration in the XBAR translator +(to avoid extra remapping of hardware), but this does not match the graphics FE +reduction opcodes used by graphics backend semaphores. The reduction operation +itself is performed by L2. + +Semaphore signedness option: + + The NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT field specifies whether the +values involved in a reduction operation will be interpreted as signed or +unsigned. + +The following table summarizes each reduction operation, and the signedness and +payload size supported for each operation: + + signedness + r op 32b 64b function (v = memory value, p = semaphore payload) + -----+-----+-----+--------------------------------------------------- + IMIN U,S U,S v = (v < p) ? v : p + IMAX U,S U,S v = (v > p) ? v : p + IXOR N/A N/A v = v ^ p + IAND N/A N/A v = v & p + IOR N/A N/A v = v | p + IADD U,S U v = v + p + INC U inv v = (v >= p) ? 0 : v + 1 + DEC U inv v = (v == 0 || v > p) ? p : v - 1 (from L2 IAS) + +An operation with signedness "N/A" will ignore the value of REDUCTION_FORMAT +when executing, and either value of REDUCTION_FORMAT is valid. If an operation +is "U only" this means a signed version of this operation is not supported, and +if it is marked "inv" then it is unsupported for any signedness. If Host sees +an unsupported reduction op (in other words, is expected to run a reduction op +while PAYLOAD_SIZE and REDUCTION_FORMAT are set to unsupported values for that +op), Host will raise the NV_PPBDMA_INTR_0_SEMAPHORE interrupt. + +Example: A signed 32-bit IADD reduction operation is valid. A signed 64-bit +IADD reduction operation is unsupported and will trigger an interrupt if sent to +Host. A 64-bit INC (or DEC) operation is not supported and will trigger an +interrupt if sent to Host. + +Legal semaphore operation combinations: + + For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, +the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. +Timestamping should also be disabled for a synpoint backed release to avoid +unexpected behavior. + + The following table diagrams the types of semaphore operations that are +possible. In the columns, "x" matches any field value. ACQ refers to any of +the ACQUIRE, ACQ_STRICT_GEQ, ACQ_CIRC_GEQ, ACQ_AND, and ACQ_NOR operations. +REL refers to either a RELEASE or a REDUCTION operation. + + OP SWITCH WFI PAYLOAD_SIZE TIMESTAMP Description + --- ------ --- ------------ --------- -------------------------------------------------------------- + ACQ 0 x 0 x acquire; 4B (32 bit comparison); retry on fail + ACQ 0 x 1 x acquire; 8B (64 bit comparison); retry on fail + ACQ 1 x 0 x acquire; 4B (32 bit comparison); switch on fail + ACQ 1 x 1 x acquire; 8B (64 bit comparison); switch on fail + REL x 0 0 1 WFI & release 4B payload + timestamp semaphore + REL x 0 1 1 WFI & release 8B payload + timestamp semaphore + REL x 1 0 1 do not WFI & release 4B payload + timestamp semaphore + REL x 1 1 1 do not WFI & release 8B payload + timestamp semaphore + REL x 0 0 0 WFI & release doubleword (4B) semaphore payload + REL x 0 1 0 WFI & release quadword (8B) semaphore payload + REL x 1 0 0 do not WFI & release doubleword (4B) semaphore payload + REL x 1 1 0 do not WFI & release quadword (8B) semaphore payload + --- ------ --- ------------ --------- -------------------------------------------------------------- + + While the channel is loaded on a PBDMA unit, information from this method +is stored in the NV_PPBDMA_SEM_EXECUTE register. Otherwise, this information +is stored in the NV_RAMFC_SEM_EXECUTE field of the RAMFC part of the channel's +instance block. + +Undefined bits: + + Bits in the NV_UDMA_SEM_EXECUTE method data that are not used by the +specified OPERATION should be set to 0. When non-zero, their behavior is +undefined. + + + +#define NV_UDMA_SEM_EXECUTE 0x0000006C /* -W-4R */ + +#define NV_UDMA_SEM_EXECUTE_OPERATION 2:0 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQUIRE 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_RELEASE 0x00000001 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_STRICT_GEQ 0x00000002 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_CIRC_GEQ 0x00000003 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_AND 0x00000004 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_NOR 0x00000005 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_OPERATION_REDUCTION 0x00000006 /* -W--V */ + +#define NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG 12:12 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_DIS 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_EN 0x00000001 /* -W--V */ + +#define NV_UDMA_SEM_EXECUTE_RELEASE_WFI 20:20 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_RELEASE_WFI_DIS 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_RELEASE_WFI_EN 0x00000001 /* -W--V */ + +#define NV_UDMA_SEM_EXECUTE_PAYLOAD_SIZE 24:24 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_PAYLOAD_SIZE_32BIT 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_PAYLOAD_SIZE_64BIT 0x00000001 /* -W--V */ + +#define NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP 25:25 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP_DIS 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP_EN 0x00000001 /* -W--V */ + +#define NV_UDMA_SEM_EXECUTE_REDUCTION 30:27 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_IMIN 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_IMAX 0x00000001 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_IXOR 0x00000002 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_IAND 0x00000003 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_IOR 0x00000004 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_IADD 0x00000005 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_INC 0x00000006 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_DEC 0x00000007 /* -W--V */ + +#define NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT 31:31 /* -W-VF */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT_SIGNED 0x00000000 /* -W--V */ +#define NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT_UNSIGNED 0x00000001 /* -W--V */ + + +NON_STALL_INT [method] - Non-Stalling Interrupt Method + + The NON_STALL_INT method causes the NV_PFIFO_INTR_0_CHANNEL_INTR field +to be set to PENDING in the channel's interrupt register, as well as +NV_PFIFO_INTR_HIER_* registers. This will cause an interrupt if it is +enabled. Host does not stall the execution of the GPU context's +method, does not switch out the GPU context, and does not disable switching the +GPU context. + A NON_STALL_INT method's data (NV_UDMA_NON_STALL_INT_HANDLE) is ignored. + Software should handle all of a channel's non-stalling interrupts before it +unbinds the channel from the GPU context. + + +#define NV_UDMA_NON_STALL_INT 0x00000020 /* -W-4R */ + +#define NV_UDMA_NON_STALL_INT_HANDLE 31:0 /* -W-VF */ + + + + +MEM_OP methods: membars, and cache and TLB management. + + MEM_OP_A, MEM_OP_B, and MEM_OP_C set up state for performing a memory +operation. MEM_OP_D sets additional state, specifies the type of memory +operation to perform, and triggers sending the mem op to HUB. To avoid +unexpected behavior for future revisions of the MEM_OP methods, all 4 methods +should be sent for each requested mem op, with irrelevant fields set to 0. +Note that hardware does not enforce the requirement that unrelated fields be set +to 0, but ignoring this advice could break forward compatibility. + Host does not wait until an engine is idle before beginning to execute +this method. + While a GPU context is bound to a channel and assigned to a PBDMA unit, +the NV_UDMA_MEM_OP_A-C values are stored in the NV_PPBDMA_MEM_OP_A-C registers +respectively. While the GPU context is not assigned to a PBDMA unit, these +values are stored in the respective NV_RAMFC_MEM_OP_A-C fields of the RAMFC part +of the GPU context's instance block in memory. + +Usage, operations, and configuration: + + MEM_OP_D_OPERATION specifies the type of memory operation to perform. This +field determines the value of the opcode on the Host/FB interface. When Host +encounters the MEM_OP_D method, Host sends the specified request to the FB and +waits for an indication that the request has completed before beginning to +process the next method. To issue a memory operation, first issue the 3 +MEM_OP_A-C methods to configure the operation as documented below. Then send +MEM_OP_D to complete the configuration and trigger the operation. The +operations available for MEM_OP_D_OPERATION are as follows: + MEMBAR - perform a memory barrier; see below. + MMU_TLB_INVALIDATE - invalidate page translation and attribute data from +the given page directory that are cached in the Memory-Management Unit TLBs. + MMU_TLB_INVALIDATE_TARGETED - invalidate page translation and attributes +data corresponding to a specific page in a given page directory. + L2_SYSMEM_INVALIDATE - invalidate data from system memory cached in L2. + L2_PEERMEM_INVALIDATE - invalidate peer-to-peer data in the L2 cache. + L2_CLEAN_COMPTAGS - clean the L2 compression tag cache. + L2_FLUSH_DIRTY - flush dirty lines from L2. + L2_WAIT_FOR_SYS_PENDING_READS - ensure all sysmem reads are past the point +of being modified by a write through a reflected mapping. To do this, L2 drains +all sysmem reads to the point where they cannot be modified by future +non-blocking writes to reflected sysmem. L2 will block any new sysmem read +requests and drain out all read responses. Note VC's with sysmem read requests +at the head would stall any request till the flush is complete. The niso-nb vc +does not have sysmem read requests so it would continue to flow. L2 will ack +that the sys flush is complete and unblock all VC's. Note this operation is a +NOP on tegra chips. + ACCESS_COUNTER_CLR - clear page access counters. + + Depending on the operation given in MEM_OP_D_OPERATION, the other fields of +all four MEM_OP methods are interpreted differently: + +MMU_TLB_INVALIDATE* +------------------- + + When the operation is MMU_TLB_INVALIDATE or MMU_TLB_INVALIDATE_TARGETED, +then Host will initiate a TLB invalidate as described above. The MEM_OP +configuration fields specify what to invalidate, where to perform the +invalidate, and optionally trigger a replay or cancel event for replayable +faults buffered within the TLBs as part of UVM page management. + When the operation is MMU_TLB_INVALIDATE_TARGETED, +MEM_OP_C_TLB_INVALIDATE_PDB must be ONE, and the TLB_INVALIDATE_TARGET_ADDR_LO +and HI fields must be filled in to specify the target page. + These operations are privileged and can only be executed from channels +with NV_PPBDMA_CONFIG_AUTH_LEVEL set to PRIVILEGED. This is configured via the +NV_RAMFC_CONFIG dword in the channel's RAMFC during channel setup. + + MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_GPC_ID and +MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID identify the GPC and uTLB +within that GPC respectively that should perform the cancel operation when +MEM_OP_C_TLB_INVALIDATE_REPLAY is CANCEL_TARGETED. These field values should be +copied from the GPC_ID and CLIENT fields from the associated +NV_UVM_FAULT_BUF_ENTRY packet or NV_PFIFO_INTR_MMU_FAULT_INFO(i) entry. The +CLIENT_UNIT_ID corresponds to the values specified by NV_PFAULT_CLIENT_GPC_* in +dev_fault.ref. These fields are used with the CANCEL_TARGETED operation. The +fields also overlap with CANCEL_MMU_ENGINE_ID, and are interpreted as +CANCEL_MMU_ENGINE_ID during reply of type REPLAY_CANCEL_VA_GLOBAL. For other +replay operations, these fields must be 0. + + MEM_OP_A_TLB_INVALIDATE_CANCEL_MMU_ENGINE_ID specifies the associated +MMU_ENGINE_ID of the requests targeted by a REPLAY_CANCEL_VA_GLOBAL +operation. The field is ignored if the replay operation is not +REPLAY_CANCEL_VA_GLOBAL. This field overlaps with CANCEL_TARGET_GPC_ID and +CANCEL_TARGET_CLIENT_UNIT_ID field. + + MEM_OP_A_TLB_INVALIDATE_INVALIDATION_SIZE is aliased/repurposed + with MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID field + when MEM_OP_C_TLB_INVALIDATE_REPLAY (below) is anything other + than CANCEL_TARGETED or CANCEL_VA_GLOBAL or + CANCEL_VA_TARGETED. In the invalidation size enabled replay type + cases, actual region to be invalidated iscalculated as + 4K*(2^INVALIDATION_SIZE) i.e., + 4K*(2^CANCEL_TARGET_CLIENT_UNIT_ID); client unit id and gpc id + are not applicable. + + MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR controls whether a Hub SYSMEMBAR +operation is performed after waiting for all outstanding acks to complete, after +the TLB is invalidated. Note if ACK_TYPE is ACK_TYPE_NONE then this field is +ignored and no MEMBAR will be performed. This is provided as a SW optimization +so that SW does not need to perform a NV_UDMA_MEM_OP_D_OPERATION_MEMBAR op with +MEMBAR_TYPE SYS_MEMBAR after the TLB_INVALIDATE. This field must be 0 if +TLB_INVALIDATE_GPC is DISABLE. + + MEM_OP_B_TLB_INVALIDATE_TARGET_ADDR_HI:MEM_OP_A_TLB_INVALIDATE_TARGET_ADDR_LO +specifies the 4k aligned virtual address of the page whose translation to +invalidate within the TLBs. These fields are valid only when OPERATION is +MMU_TLB_INVALIDATE_TARGETED; otherwise, they must be set to 0. + + MEM_OP_C_TLB_INVALIDATE_PDB controls whether a TLB invalidate should apply +to a particular page directory or to all of them. If PDB is ALL, then all page +directories are invalidated. If PDB is ONE, then the PDB address and aperture +are specified in the PDB_ADDR_LO:PDB_ADDR_HI and PDB_APERTURE fields. +Note that ALL does not make sense when OPERATION is MMU_TLB_INVALIDATE_TARGETED; +the behavior in that case is undefined. + + MEM_OP_C_TLB_INVALIDATE_GPC controls whether the GPC-MMU and uTLB entries +should be invalidated in addition to the Hub-MMU TLB (Note: the Hub TLB is +always invalidated). Set it to INVALIDATE_GPC_ENABLE to invalidate the GPC TLBs. +The REPLAY, ACK_TYPE, and SYSMEMBAR fields are only used by the GPC TLB and so +are ignored if INVALIDATE_GPC is DISABLE. + + MEM_OP_C_TLB_INVALIDATE_REPLAY specifies the type of replay to perform in +addition to the invalidate. A replay causes all replayable faults outstanding +in the TLB to attempt their translations again. Once a TLB acks a replay, that +TLB may start accepting new translations again. The replay flavors are as +follows: + NONE - do not replay any replayable faults on invalidate. + START - initiate a replay across all TLBs, but don't wait for completion. + The replay will be acked as soon as the invalidate is processed, but + replays themselves are in flight and not necessarily translated. + START_ACK_ALL - initiate the replay and wait until it completes. + The replay will be acked after all pending transactions in the replay + fifo have been translated. New requests will remain stalled in the + gpcmmu until all transactions in the replay fifo have completed and + there are no pending faults left in the replay fifo. + CANCEL_TARGETED - initiate a cancel-replay on a targeted uTLB, causing any + replayable translations buffered in that uTLB to become non-replayable + if they fault again. In this case, the first faulting translation + will be reported in the NV_PFIFO_INTR_MMU_FAULT registers and will + raise PFIFO_INTR_0_MMU_FAULT. The specific TLB to target for the + cancel is specified in the CANCEL_TARGET fields. Note the TLB + invalidate still applies globally to all TLBs. + CANCEL_GLOBAL - like CANCEL_TARGETED, but all TLBs will cancel-replay. + CANCEL_VA_GLOBAL - initiates a cancel operation that cancels all requests + with the matching mmu_engine_id and access_type that land in the + specified 4KB aligned virtual address within the scope of specified + PDB. All other requests are replayed. If the specified engine is not + bound, or if the PDB of the specified engine does not match the + specified PDB, all requests will be replayed and none will be canceled. + + MEM_OP_C_TLB_INVALIDATE_ACK_TYPE controls which sort of ACK the uTLBs wait +for after having issued a membar to L2. ACK_TYPE_NONE does not perform any sort +of membar. ACK_TYPE_INTRANODE waits for an ack from the XBAR. +ACK_TYPE_GLOBALLY waits for an L2 ACK. ACK_TYPE_GLOBALLY is equivalent to a +MEMBAR operation from the engine, or a SYS_MEMBAR if +MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR is EN. + + MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL specifies which levels in the page +directory hierarchy of the TLB cache to invalidate. The levels are numbered +from the bottom up, with the PTE being at the bottom with level 1. The +specified level and all those below it in the hierarchy -- that is, all those +with a lower numbered level -- are invalidated. ALL (the 0 default) is +special-cased to indicate the top level; this causes the invalidate to apply to +the entire page mapping structure. The field is ignored if the replay operation +is REPLAY_CANCEL_VA_GLOBAL. + + MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE specifies the associated ACCESS_TYPE of +the requests targeted by a REPLAY_CANCEL_VA_GLOBAL operation. This field +overlaps with the INVALIDATE_PAGE_TABLE_LEVEL field, and is ignored if the +replay operation is not REPLAY_CANCEL_VA_GLOBAL. The ACCESS_TYPE field can get +one of the following values: + READ - the cancel_va_global should be performed on all pending read requests. + WRITE - the cancel_va_global should be performed on all pending write requests. + ATOMIC_STRONG - the cancel_va_global should be performed on all pending + strong atomic requests. + ATOMIC_WEAK - the cancel_va_global should be performed on all pending + weak atomic requests. + ATOMIC_ALL - the cancel_va_global should be performed on all pending atomic + requests. + WRITE_AND_ATOMIC - the cancel_va_global should be performed on all pending + write and atomic requests. + ALL - the cancel_va_global should be performed on all pending requests. + + + MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE specifies the target aperture of the +page directory for which TLB entries should be invalidated. This field must be +0 when TLB_INVALIDATE_PDB is ALL. + + MEM_OP_C_TLB_INVALIDATE_PDB_ADDR_LO specifies the low 20 bits of the +4k-block-aligned PDB (base address of the page directory) when +TLB_INVALIDATE_PDB is ONE; otherwise this field must be 0. The PDB byte address +should be 4k aligned and right-shifted by 12 before being split and packed into +the ADDR fields. Note that the PDB_ADDR_LO field starts at bit 12, so it is +possible to set MEM_OP_C to the low 32 bits of the byte address, mask off the +low 12, and then or in the rest of the configuration fields. + + MEM_OP_D_TLB_INVALIDATE_PDB_ADDR_HI contains the high bits of the PDB when +TLB_INVALIDATE_PDB is ONE. Otherwise this field must be 0. + +UVM handling of replayable faults: + + The following example illustrates how TLB invalidate may be used by the +UVM driver: + 1. When the TLB invalidate completes, all memory accesses using the old + TLB entries prior to the invalidate will finish translation (but not + completion), and any new virtual accesses will trigger new + translations. The outstanding in-flight translations are allowed to + fault but will not indefinitely stall the invalidate. + 2. When the TLB invalidate completes, in-flight memory accesses using the + old physical translations may not yet be visible to other GPU clients + (such as CopyEngine) or to the CPU. Accesses coming from clients that + support recoverable faults (such as TEX and GCC) can be made visible by + requesting the MMU to perform a membar using the ACK_TYPE and SYSMEMBAR + fields. + a. If ACK_TYPE is NONE the SYSMEMBAR field is ignored and no membar + is performed. + b. If ACK_TYPE is INTRANODE the invalidate will wait until all + in-flight physical accesses using the old translations are visible + to XBAR clients on the blocking VC. + c. If ACK_TYPE is GLOBALLY the invalidate will wait until all + in-flight physical accesses using the old translations are at the + point of coherence in L2, meaning writes will be visible to all + other GPU clients and reads will not be mutable by them. + d. If the SYSMEMBAR field is set to EN then a Hub SYSMEMBAR will also + be performed following the ACK_TYPE membar. This is the equivalent + of performing a NV_UDMA_MEM_OP_C_MEMBAR_TYPE_SYS_MEMBAR. + 3. If fault replay was requested then all pending recoverable faults in + the TLB replay list will be retranslated. This includes all faults + discovered while the invalidate was pending. This replay may generate + more recoverable faults. + 4. If fault replay cancel was requested then another replay is attempted of + all pending replayable faults on the targeted TLB(s). If any of these + re-fault they are discarded (sticky NACK or ACK/TRAP sent back to the + client depending on the setting of NV_PGPC_PRI_MMU_DEBUG_CTRL). + + + +MEMBAR +------ + + When the operation is MEMBAR, Host will perform a memory barrier operation. +All other fields must be set to 0 except for MEM_OP_C_MEMBAR_TYPE. When +MEMBAR_TYPE is MEMBAR, then a memory barrier will be performed with respect to +other clients on the GPU. When it is SYS_MEMBAR, the memory barrier will also be +performed with respect to the CPU and peer GPUs. + + MEMBAR - This issues a MEMBAR operation following all reads, writes, and +atomics currently in flight from the PBDMA. The MEMBAR operation will push all +such accesses already in flight on the same VC as the PBDMA to a point of GPU +coherence before proceeding. After this operation is complete, reads from any +GPU client will see prior writes from this PBDMA, and writes from any GPU client +cannot modify the return data of earlier reads from this PBDMA. This is true +regardless of whether those accesses target vidmem, sysmem, or peer mem. + WARNING: This only guarantees accesses from the same VC as the PBDMA that +are already in flight are coherent. Accesses from clients such as SM or a +non-PBDMA engine need already be at some point of coherency before this +operation to be coherent. + + SYS_MEMBAR - This implies the MEMBAR type above but in addition to having +accesses reach coherence with all GPU clients, this further waits for accesses +to be coherent with respect to the CPU and peer GPUs as well. After this +operation is complete, reads from the CPU or peer GPUs will see prior writes +from this PBDMA, and writes from the CPU or peer GPUs cannot modify the return +data of earlier reads from this PBDMA (with the exception of CPU reflected +writes, which can modify earlier reads). Note SYS_MEMBAR is really only needed +to guarantee ordering with off-chip clients. For on-chip clients such as the +graphics engine or copy engine, accesses to sysmem will be coherent with just a +MEMBAR operation. SYS_MEMBAR provides the same function as +OPERATION_SYSMEMBAR_FLUSH on previous architectures. + WARNING: As described above, SYS_MEMBAR will not prevent CPU reflected +writes issued after the SYS_MEMBAR from clobbering the return data of reads +issued before the SYS_MEMBAR. To handle this case, the invalidate must be +followed with a separate L2_WAIT_FOR_SYS_PENDING_READS mem op. + + + +L2* +--- + + These values initiate a cache management operation -- see above. All other +fields must be 0; there are no configuration options. + + + + +The ACCESS_COUNTER_CLR operation +-------------------------------- + When MEM_OP_D_OPERATION is ACCESS_COUNTER_CLR, Host will request to clear +the the page access counters. There are two types of access counters - MIMC and +MOMC. This operation can be issued to clear all counters of all types, all +counters of a specified type (MIMC or MOMC), or a specific counter indicated by +counter type, bank and notify tag. + This operation is privileged and can only be executed from channels with +NV_PPBDMA_CONFIG_AUTH_LEVEL set to PRIVILEGED. This is configured via the +NV_RAMFC_CONFIG dword in the channel's RAMFC during channel setup. + +The operation uses the following fields in the MEM_OP_* methods: +ACCESS_COUNTER_CLR_TYPE (TY) : type of the access counter clear + operation +ACCESS_COUNTER_CLR_TARGETED_TYPE (T) : type of the access counter for + targeted operation +ACCESS_COUNTER_CLR_TARGETED_NOTIFY_TAG : 20 bits notify tag of the access + counter for targeted operation +ACCESS_COUNTER_CLR_TARGETED_BANK : 4 bits bank number of the access + counter for targeted operation + + + + + +MEM_OP method field defines: + +MEM_OP_A [method] - Memory Operation Method 1/4 - see above for documentation + +#define NV_UDMA_MEM_OP_A 0x00000028 /* -W-4R */ + +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID 5:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_INVALIDATION_SIZE 5:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_GPC_ID 10:6 /* -W-VF */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_CANCEL_MMU_ENGINE_ID 6:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR 11:11 /* -W-VF */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR_EN 0x00000001 /* -W--V */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR_DIS 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_TARGET_ADDR_LO 31:12 /* -W-VF */ + + +MEM_OP_B [method] - Memory Operation Method 2/4 - see above for documentation + +#define NV_UDMA_MEM_OP_B 0x0000002c /* -W-4R */ + +#define NV_UDMA_MEM_OP_B_TLB_INVALIDATE_TARGET_ADDR_HI 31:0 /* -W-VF */ + + +MEM_OP_C [method] - Memory Operation Method 3/4 - see above for documentation + +#define NV_UDMA_MEM_OP_C 0x00000030 /* -W-4R */ + +Membar configuration field. Note: overlaps MMU_TLB_INVALIDATE* config fields. +#define NV_UDMA_MEM_OP_C_MEMBAR_TYPE 2:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_MEMBAR_TYPE_SYS_MEMBAR 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_MEMBAR_TYPE_MEMBAR 0x00000001 /* -W--V */ +Invalidate TLB entries for ONE page directory base, or for ALL of them. +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB 0:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_ONE 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_ALL 0x00000001 /* -W--V */ +Invalidate GPC MMU TLB entries or not (Hub-MMU entries are always invalidated). +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_GPC 1:1 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_GPC_ENABLE 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_GPC_DISABLE 0x00000001 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY 4:2 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_NONE 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_START 0x00000001 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_START_ACK_ALL 0x00000002 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_TARGETED 0x00000003 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_GLOBAL 0x00000004 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_VA_GLOBAL 0x00000005 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE 6:5 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_NONE 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_GLOBALLY 0x00000001 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_INTRANODE 0x00000002 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE 9:7 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_READ 0 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_WRITE 1 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_STRONG 2 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_RSVRVD 3 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_WEAK 4 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_ALL 5 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_WRITE_AND_ATOMIC 6 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ALL 7 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL 9:7 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_ALL 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_PTE_ONLY 0x00000001 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE0 0x00000002 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE1 0x00000003 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE2 0x00000004 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3 0x00000005 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4 0x00000006 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE5 0x00000007 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE 11:10 /* -W-VF */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_VID_MEM 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_SYS_MEM_COHERENT 0x00000002 /* -W--V */ +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_SYS_MEM_NONCOHERENT 0x00000003 /* -W--V */ +Address[31:12] of page directory for which TLB entries should be invalidated. +#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_ADDR_LO 31:12 /* -W-VF */ + +#define NV_UDMA_MEM_OP_C_ACCESS_COUNTER_CLR_TARGETED_NOTIFY_TAG 19:0 /* -W-VF */ + +MEM_OP_D [method] - Memory Operation Method 4/4 - see above for documentation +(Must be preceded by MEM_OP_A-C.) + +#define NV_UDMA_MEM_OP_D 0x00000034 /* -W-4R */ + +Address[58:32] of page directory for which TLB entries should be invalidated. +#define NV_UDMA_MEM_OP_D_TLB_INVALIDATE_PDB_ADDR_HI 26:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_D_OPERATION 31:27 /* -W-VF */ +#define NV_UDMA_MEM_OP_D_OPERATION_MEMBAR 0x00000005 /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_MMU_TLB_INVALIDATE 0x00000009 /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_MMU_TLB_INVALIDATE_TARGETED 0x0000000a /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_L2_PEERMEM_INVALIDATE 0x0000000d /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_L2_SYSMEM_INVALIDATE 0x0000000e /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_L2_CLEAN_COMPTAGS 0x0000000f /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_L2_FLUSH_DIRTY 0x00000010 /* -W--V */ +#define NV_UDMA_MEM_OP_D_OPERATION_L2_WAIT_FOR_SYS_PENDING_READS 0x00000015 /* -W--V */ + +#define NV_UDMA_MEM_OP_D_OPERATION_ACCESS_COUNTER_CLR 0x00000016 /* -W--V */ + +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE 1:0 /* -W-VF */ +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_MIMC 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_MOMC 0x00000001 /* -W--V */ +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_ALL 0x00000002 /* -W--V */ +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_TARGETED 0x00000003 /* -W--V */ + +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE 2:2 /* -W-VF */ +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE_MIMC 0x00000000 /* -W--V */ +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE_MOMC 0x00000001 /* -W--V */ + +#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_BANK 6:3 /* -W-VF */ + + +SET_REF [method] - Set Reference Count Method + + The SET_REF method allows the user to set the reference count +(NV_PPBDMA_REF_CNT) to a value. The reference count may be monitored to track +Host's progress through the pushbuffer. Instead of monitoring +NV_RAMUSERD_TOP_LEVEL_GET, software may put into the method stream SET_REF +methods that set the reference count to ever increasing values, and then read +NV_RAMUSERD_REF to determine how far in the stream Host has gone. + Before the reference count value is altered, Host waits for the engine to +be idle (to have completed executing all earlier methods), issues a SysMemBar +flush, and waits for the flush to complete. + While the GPU context is bound to a channel and assigned to a PBDMA unit, +the reference count value is stored in the NV_PPBDMA_REF register. While the +GPU context is not assigned to a PBDMA unit, the reference count value is stored +in the NV_RAMFC_REF field of the RAMFC portion of the GPU context's GPU-instance +block. + + +#define NV_UDMA_SET_REF 0x00000050 /* -W-4R */ + +#define NV_UDMA_SET_REF_CNT 31:0 /* -W-VF */ + + + +CRC_CHECK [method] - Method-CRC Check Method + + When debugging a problem in a real chip, it may be useful to determine +whether a PBDMA unit has sent the proper methods toward the engine. The +CRC_CHECK method checks whether the cyclic redundancy check value +calculated over previous methods has an expected value. If the value in the +NV_PPBDMA_METHOD_CRC register is not equal to NV_UDMA_CRC_CHECK_VALUE, then +Host initiates an interrupt (NV_PPBDMA_INTR_0_METHODCRC) and stalls. After +each comparison, the NV_PPBDMA_METHOD_CRC register is cleared. + The IEEE 802.3 CRC-32 polynomial (0x04c11db7) is used to calculate CRC +values. The CRC is calculated over the method subchannel, method address, and +method data of methods sent to an engine. Host can set both single and dual +methods to engines. The CRC is calculated as if dual methods were sent as +two single methods. The CRC is calculated on the byte-stream in little-endian +order. + + +Pseudocode for CRC calculation is: + + static NVR_U32 table[256]; + void init() { + for (NVR_U32 i = 0; i < 256; i++) { // create crc value for every byte + NVR_U32 crc = i << 24; + for (int j = 0; j < 8; j++) { // for every bit in the byte + if (crc & 0x80000000) crc = (crc << 1) ^ 0x04c11db7 + else crc = (crc << 1); + } + table[i] = crc; + } + } + NVR_U32 new_crc(unsigned char byte, NVR_U32 old_crc) { + NVR_U32 crc_top_byte = old_crc >> 24; + crc_top_byte ^= byte; + NVR_U32 new_crc = (old_crc << 8) ^ table[crc_top_byte]; + return new_crc; + } + + This method is used for debug. + This method was added in Fermi. + + +#define NV_UDMA_CRC_CHECK 0x0000007c /* -W-4R */ + +#define NV_UDMA_CRC_CHECK_VALUE 31:0 /* -W-VF */ + + +YIELD [method] - Yield Method + + The YIELD method causes a channel to yield the remainder of its timeslice. +The method's OP field specifies whether the channels' PBDMA timeslice, the +channel's runlist timeslice, or no timeslice is yielded. + If YIELD_OP_RUNLIST_TIMESLICE, then Host will act as if the channel's +runlist or TSG timeslice expired. Host will exit the TSG and switch to the next +channel after the TSG on the runlist. If there is no such channel to switch to, +then YIELD_OP_RUNLIST_TIMESLICE will not cause a switch. + When the PBDMA executes a YIELD_OP_RUNLIST_TIMESLICE method, it guarantees +that it will not execute further methods from the same channel or TSG until the +channel is restarted by the scheduler. However, note that this does not yield +the engine timeslice; if the engine is preemptable, the context will continue +to run on the engine until the remainder of its timeslice expires before Host +will attempt to preempt it. Also if there is an outstanding ctx load either +due to ctx_reload or from the other PBDMA in the SCG case, then yielding won't +take place until the outstanding ctx load finishes or aborts due to a preempt. +When the ctx load does complete on the other PBDMA, it is possible for that +PBDMA to execute some small number of additional methods before the runlist +yield takes effect and that PBDMA halts work for its channel. + If NV_UDMA_YIELD_OP_TSG, and if the channel is part of a TSG, then Host +will switch to the next channel in the same TSG, and if the channel is not part +of the TSG then this will be treated similar to YIELD_OP_NOP. If there is only +one channel with work in the TSG, Host will simply reschedule the same channel +in the TSG. YIELD_OP_TSG does not cause the scheduler to leave the TSG. The TSG +timeslice (TSG timeslice is equivalent to runlist timeslice for TSGs) counter +continues to increment through the channel switch and does not restart after +executing the yield method. When the PBDMA executes a Yield method, it +guarantees that it will not execute the method following that Yield until the +channel is restarted by the scheduler. + YIELD_OP_NOP is simply a NOP. Neither timeslice is yielded. This was kept +for compatibility with existing tests; NV_UDMA_NOP is the preferred NOP, but +also see the universal NOP PB instruction. See the description of +NV_FIFO_DMA_NOP in the "FIFO_DMA" section of dev_ram.ref. + + If an unknown OP is specified, Host will raise an NV_PPBDMA_INTR_*_METHOD +interrupt. + + +#define NV_UDMA_YIELD 0x00000080 /* -W-4R */ + +#define NV_UDMA_YIELD_OP 1:0 /* -W-VF */ +#define NV_UDMA_YIELD_OP_NOP 0x00000000 /* -W--V */ +#define NV_UDMA_YIELD_OP_RUNLIST_TIMESLICE 0x00000002 /* -W--V */ +#define NV_UDMA_YIELD_OP_TSG 0x00000003 /* -W--V */ + + +WFI [method] - Wait-for-Idle Method + + The WFI (Wait-For-Idle) method will stall Host from processing any more +methods on the channel until the engine to which the channel last sent methods +is idle. Note that the subchannel encoded in the method header is ignored (as +it is for all Host-only methods) and does NOT specify which engine to idle. In +Kepler, this is only relevant on runlists that serve multiple engines +(specifically, the graphics runlist, which also serves GR COPY). + The WFI method has a single field SCOPE which specifies the level of WFI +the Host method performs. ALL waits for all work in the engine from the same +context to be idle across all classes and subchannels. CURRENT_VEID causes the +WFI to only apply to work from the same VEID as the current channel. Note for +engines that do not support VEIDs, CURRENT_VEID works identically to ALL. + Note that Host methods ignore the subchannel field in the method. A Host +WFI method always applies to the engine the channel last sent methods to. If a +WFI with ALL is specified and the channel last sent work to the GRCE, this will +only guarantee that GRCE has no work in progress. It is possible that the GR +context will have work in progress from other VEIDs, or even the current VEID if +the current channel targets GRCE and has never sent FE methods before. This +means that if SW wants to idle the graphics pipe for all VEIDs, SW must send a +method to GR immediately before the WFI method. A GR_NOP is sufficient. + Note also that even if the current NV_PPBDMA_TARGET is GRAPHICS and not +GRCE, there are cases where Host can trivially complete a WFI without sending +the NV_PMETHOD_HOST_WFI internal method to FE. This can happen when + +1. the runlist timeslices to a different TSG just before the WFI method, +2. the other TSG does a ctxsw request due to methods for FE, and +3. FECS reports non-preempted in the ctx ack, so CTX_RELOAD doesn't get set. + +In that case, when the channel switches back onto the PBDMA, the PBDMA rightly +concludes that there is no way the context could be non-idle for that channel, +and therefore filters out the WFI, even if the other PBDMA is sending work to +other VEIDs. As in the subchannel case, a GR_NOP preceding the WFI is +sufficient to ensure that a SCOPE_ALL_VEID WFI will be sent to FE regardless of +timeslicing as long as the NOP and the WFI are submitted as part of the same +GP_PUT update. This is ensured by the semantics of the channel state +SHOULD_SEND_HOST_TSG_EVENT behaving like CTX_RELOAD: the GR_NOP causes the PBDMA +to set the SHOULD_SEND_HOST_TSG_EVENT state, so even a channel or context switch +will still result in the PBDMA having the engine context loaded. Thus the WFI +will cause the HOST_WFI internal method to be sent to FE. + + +#define NV_UDMA_WFI 0x00000078 /* -W-4R */ + +#define NV_UDMA_WFI_SCOPE 0:0 /* -W-VF */ +#define NV_UDMA_WFI_SCOPE_CURRENT_VEID 0x00000000 /* -W--V */ +#define NV_UDMA_WFI_SCOPE_ALL 0x00000001 /* -W--V */ +#define NV_UDMA_WFI_SCOPE_ALL_VEID 0x00000001 /* */ + + + +CLEAR_FAULTED [method] - Clear Faulted Method + + The CLEAR_FAULTED method clears a channel's PCCSR PBDMA_FAULTED or +ENG_FAULTED bit. These bits are set by Host in response to a PBDMA fault or +engine fault respectively on the specified channel; see dev_fifo.ref. + + The CHID field specifies the ID of the channel whose FAULTED bit is to be +cleared. + + The TYPE field specifies which FAULTED bit is to be cleared: either +PBDMA_FAULTED or ENG_FAULTED. + + When Host receives a CLEAR_FAULTED method for a channel, the corresponding +PCCSR FAULTED bit for the channel should be set. However, due to a race between +SW seeing the fault message from MMU and handling the fault and sending the +CLEAR_FAULT method verses Host seeing the fault from CE or MMU and setting the +FAULTED bit, it is possible for the CLEAR_FAULTED method to arrive before the +FAULTED bit is set. Host will handle a CLEAR_FAULTED method according to the +following cases: + + a. The FAULTED bit specified by TYPE is set. Host will clear the bit and +retire the CLEAR_FAULTED method. + + b. If the bit is not set, the PBDMA will continue to retry the +CLEAR_FAULTED method on every PTIMER microsecond tick by rechecking the FAULTED +bit of the target channel. Once the bit is set, the PBDMA will clear the bit and +retire the method. The execution of the fault handling channel will stall on the +CLEAR_FAULTED method until the FAULTED bit for the target channel is set. The +PBDMA will retry the CLEAR_FAULTED method approximately every microsecond. + + c. If the fault handling channel's timeslice expires while stalled on a +CLEAR_FAULTED method, the channel will switch out. Once rescheduled, the +channel will resume retrying the CLEAR_FAULTED method. + + d. To avoid indefinitely waiting for the CLEAR_FAULTED method to retire +(likely due to wrongly injected CLEAR_FAULTED method due to a SW bug), Host +has a timeout mechanism to inform SW of a potential bug. This timeout is +controlled by NV_PFIFO_CLEAR_FAULTED_TIMEOUT; see dev_fifo.ref for details. + + e. When a CLEAR_FAULTED timeout is detected, Host will raise a stalling +interrupt by setting the NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR field. The +address of the invalid CLEAR_FAULTED method will be in NV_PPBDMA_METHOD0, and +its payload will be in NV_PPBDMA_DATA0. + + Note Setting the timeout value too low could result in false stalling +interrupts to SW. The timeout should be set equal to NV_PFIFO_FB_TIMEOUT_PERIOD. + + Note the CLEAR_FAULTED timeout mechanism uses the same PBDMA registers and +RAMFC fields as the semaphore acquire timeout mechanism: +NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL is set TRUE when the first attempt fails, and +the NV_PPBDMA_ACQUIRE_DEADLINE is loaded with the sum of the current PTIMER and +the NV_PFIFO_CLEAR_FAULTED_TIMEOUT. The ACQUIRE_FAIL bit is reset to FALSE when +the CLEAR_FAULTED method times out or succeeds. + + +#define NV_UDMA_CLEAR_FAULTED 0x00000084 /* -W-4R */ + +#define NV_UDMA_CLEAR_FAULTED_CHID 11:0 /* -W-VF */ +#define NV_UDMA_CLEAR_FAULTED_TYPE 31:31 /* -W-VF */ +#define NV_UDMA_CLEAR_FAULTED_TYPE_PBDMA_FAULTED 0x00000000 /* -W--V */ +#define NV_UDMA_CLEAR_FAULTED_TYPE_ENG_FAULTED 0x00000001 /* -W--V */ + + + + Addresses that are not defined in this device are reserved. Those below +0x100 are reserved for future Host methods. Addresses 0x100 and beyond are +reserved for the engines served by Host. |