From f9e4e0e07fd5a6a7757db977f69c8e91a0ae283f Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Wed, 12 Jun 2019 14:41:51 -0700 Subject: New ref manuals directory, delete old locations As decided in a recent OpenSource-Approval meeting, we want the directory structure for reference manuals here to be fairly close to the way they are organized internal to NVIDIA. This CL therefore does the following: Rename from: Host-Fifo/volta/gv100/* Display-Ref-Manuals/gv100/* to: manuals/volta/gv100/* Regenerate index.html files to match (important for the "github pages" site, at https://nvidia.github.io/open-gpu-doc/ . Reviewed by: Maneet Singh --- Host-Fifo/volta/gv100/dev_pbdma.ref.txt | 4261 ------------------------------- 1 file changed, 4261 deletions(-) delete mode 100644 Host-Fifo/volta/gv100/dev_pbdma.ref.txt (limited to 'Host-Fifo/volta/gv100/dev_pbdma.ref.txt') diff --git a/Host-Fifo/volta/gv100/dev_pbdma.ref.txt b/Host-Fifo/volta/gv100/dev_pbdma.ref.txt deleted file mode 100644 index bc5163a..0000000 --- a/Host-Fifo/volta/gv100/dev_pbdma.ref.txt +++ /dev/null @@ -1,4261 +0,0 @@ -Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. - -Permission is hereby granted, free of charge, to any person obtaining a -copy of this software and associated documentation files (the "Software"), -to deal in the Software without restriction, including without limitation -the rights to use, copy, modify, merge, publish, distribute, sublicense, -and/or sell copies of the Software, and to permit persons to whom the -Software is furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL -THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER -DEALINGS IN THE SOFTWARE. --------------------------------------------------------------------------------- - -1 - INTRODUCTION -================== - - A Host's PBDMA unit fetches pushbuffer data from memory, generates -commands, called "methods", from the fetched data, executes some of the -generated methods itself, and sends the remainder of the methods to engines. - This manual describes the Host PBDMA register space and all Host methods. -The NV_PPBDMA space defines registers that are contained within each of Host's -PBDMA units. Each PBDMA unit is allocated a 8KB address space for its -registers. - The NV_UDMA space defines the Host methods. A method consists of an -address doubleword and a data doubleword. The address specifies the operation -to be performed. The data is an operand. The NV_UDMA address space contains -the addresses of the methods that are executed by a PBDMA unit. -GP_ENTRY0 and GP_ENTRY1 - GP-Entry Memory Format - - A pushbuffer contains the specifications of the operations that a GPU -context is to perform for a particular client. Pushbuffers are stored in -memory. A doubleword-sized (4-byte) unit of pushbuffer data is known as a -pushbuffer entry. GP entries indicate the location of the pushbuffer data in -memory. GP entries themselves are also stored in memory. - A GP entry specifies the location and size of a pushbuffer segment (a -contiguous block of PB entries) in memory. See "FIFO_DMA" in dev_ram.ref for -details about pushbuffer segments and the format of pushbuffer data. - - The NV_PPBDMA_GP_ENTRY0_GET and NV_PPBDMA_GP_ENTRY1_GET_HI fields of a GP -entry specify the 38-bit dword-address (which would make a 40-bit byte-address) -of the first pushbuffer entry of the GP entry's pushbuffer segment. Because -each pushbuffer entry (and by extension each pushbuffer segment) is doubleword -aligned (4-byte aligned), the least significant 2 bits of the 40-bit -byte-address are not stored. The byte-address of the first pushbuffer entry in -a GP entry's pushbuffer segment is -(GP_ENTRY1_GET_HI << 32) + (GP_ENTRY0_GET << 2). - The NV_PPBDMA_GP_ENTRY1_LENGTH field, when non-zero, indicates the number -of pushbuffer entries contained within the GP entry's pushbuffer segment. The -byte-address of the first pushbuffer entry beyond the pushbuffer segment is -(GP_ENTRY1_GET_HI << 32) + (GP_ENTRY0_GET << 2) + (GP_ENTRY1_LENGTH * 4). - If NV_PPBDMA_GP_ENTRY1_LENGTH is CONTROL (0), then the GP entry is a -"control" entry, meaning this GP entry will not cause any PB data to be fetched -or executed. In this case, the NV_PPBDMA_GP_ENTRY1_OPCODE field specifies an -operation to perform, and the NV_PPBDMA_GP_ENTRY0_OPERAND field contains the -operand. The available operations are as follows: - - * NV_PPBDMA_GP_ENTRY1_OPCODE_NOP: no operation will be performed, but note - that the SYNC field is still respected--see below. - - * NV_PPBDMA_GP_ENTRY1_OPCODE_GP_CRC: the ENTRY0_OPERAND field is compared - with the cyclic redundancy check value that was calculated over previous - GP entries (NV_PPBDMA_GP_CRC). After each comparison, the - NV_PPBDMA_GP_CRC is cleared, whether they match or differ. If they - differ, then Host initiates an interrupt (NV_PPBDMA_INTR_0_GPCRC). For - recovery, clearing the interrupt will cause the PBDMA to continue as if - the control entry was OPCODE_NOP. - - * NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC: the ENTRY0_OPERAND is compared - with the CRC value that was calculated over the previous pushbuffer - segment (NV_PPBDMA_PB_CRC). The PB CRC resets to 0 with each pushbuffer - segment. If the two CRCs differ, Host will raise the - NV_PPBDMA_INTR_0_PBCRC interrupt. For recovery, clearing the interrupt - will continue as if the control entry was OPCODE_NOP. Note the PB_CRC is - indeterminate if an END_PB_SEGMENT PB control entry was used in the prior - segment or if SSDM disabled the device and the segment had conditional - fetching enabled. - - Host supports two privilege levels for channels: privileged and -non-privileged. The privilege level is determined by the -NV_PPBDMA_CONFIG_AUTH_LEVEL field set from the corresponding NV_RAMFC_CONFIG -dword in the RAMFC. Non-privileged channels cannot execute privileged methods, -but privileged channels can. Any attempt to run a privileged operation from a -non-privileged channel will result in PB raising NV_PPBDMA_INTR_0_METHOD. - - - The NV_PPBDMA_GP_ENTRY1_SYNC field specifies whether a pushbuffer may be -fetched before Host has finished processing the preceding PB segment. If this -field is SYNC_PROCEED, then Host does not wait for the preceding PB segment to -be processed. If this field is SYNC_WAIT, then Host waits until the preceding -PB segment has been processed by Host before beginning to fetch the current PB -segment. - Host's processing of a PB segment consists of parsing PB entries into PB -instructions, decoding those instructions into control entries or method -headers, generating methods from method headers, determining whether methods are -to be executed by Host or by an engine, executing Host methods, and sending -non-Host methods and SetObject methods to engines. - Note that in the case where the final PB entry of the preceding PB segment -is a method header representing a PB compressed method sequence of nonzero -length--that is, the compressed method sequence is split across PB segments with -all of its method data entries in the PB segment for which SYNC_WAIT is -set--then Host is considered to have finished processing the preceding PB -segment once that method header is read. However, splitting a PB compressed -method sequence for software methods is not supported because Host will issue -the DEVICE interrupt indicating the SW method as soon as it processess the -method header, which happens prior to fetching the method data entries for that -compressed method sequence. Thus SW cannot actually execute any of the methods -in the sequence because the method data is not yet available, leaving the PBDMA -wedged. - When SYNC_WAIT is set, Host does not wait for any engine methods generated -from the preceding PB segment to complete. Host does not automatically wait -until an engine is done processing all methods generated from that PB segment. -If software desires that the engine finish processing all methods generated from -one PB segment before a second PB segment is fetched, then software may place -Host methods that wait until the engine is idle in the first PB segment (like -WFI, SET_REF, or SEM_EXECUTE with RELEASE_WFI_EN set). Alternatively, software -might put a semaphore acquire at the end of the first PB segment, and have an -engine release the semaphore. In both cases, SYNC_WAIT must be set on the -second PB segment. This field applies even if the NV_PPBDMA_GP_ENTRY1_LENGTH -field is zero; if SYNC_WAIT is specified in this case, no further GP entries -will be processed until the wait finishes. - - Some parts of a pushbuffer may not be executed depending on the value of -the NV_PPBDMA_SUBDEVICE_ID and SUBDEVICE_MASK. If an entire PB segment will not -be executed due to conditional execution, Host need not even bother fetching the -PB segment. - The NV_PPBDMA_GP_ENTRY0_FETCH field indicates whether the PB segment -specified by the GP entry should be fetched unconditionally or fetched -conditionally. If this field is FETCH_UNCONDITIONAL, then the PB segment is -fetched unconditionally. If this field is FETCH_CONDITIONAL, then the PB -segment is only fetched if the NV_PPBDMA_SUBDEVICE_STATUS field is -STATUS_ACTIVE. - -******************************************************************************** -Warning: When using subdevice masking, one must take care to synchronize -properly with any later GP entries marked FETCH_CONDITIONAL. If GP fetching -gets too far ahead of PB processing, it is possible for a later conditional PB -segment to be discarded prior to reaching an SSDM command that sets -SUBDEVICE_STATUS to ACTIVE. This would cause Host to execute garbage data. One -way to avoid this would be to set the SYNC_WAIT flag on any FETCH_CONDITIONAL -segments following a subdevice reenable. -******************************************************************************** - - If the PB segment is not fetched then it behaves as an OPCODE_NOP control -entry. If a PB segment contains a SET_SUBDEVICE_MASK PB instruction that Host -must see, then the GP entry for that PB segment must specify -FETCH_UNCONDITIONAL. - If the PB segment specifies FETCH_CONDITIONAL and the subdevice mask shows -STATUS_ACTIVE, but the PB segment contains a SET_SUBDEVICE_MASK PB instruction -that will disable the mask, the rest of the PB segment will be discarded. In -that case, an arbitrary number of entries past the SSDM may have already updated -the PB CRC, rendering the PB CRC indeterminate. - If Host must wait for a previous PB segment's Host processing to be -completed before examining NV_PPBDMA_SUBDEVICE_STATUS, then the GP entry should -also have its SYNC_WAIT field set. - A PB segment marked FETCH_CONDITIONAL must not have a PB compressed method -sequence that crosses a PB segment boundary (with its header in previous non- -conditional PB segment and its final valid data in a conditional PB segment)-- -doing so will cause a NV_PPBDMA_INTR_0_PBSEG interrupt. - - Software may monitor Host's progress through the pushbuffer by reading the -channel's NV_RAMUSERD_TOP_LEVEL_GET entry from USERD, which is backed by Host's -NV_PPBDMA_TOP_LEVEL_GET register. See "NV_PFIFO_USERD_WRITEBACK" in -dev_fifo.ref for information about how frequently this information is written -back into USERD. If a PB segment occurs multiple times within a pushbuffer -(like a commonly used subroutine), then progress through that segment may be -less useful for monitoring, because software will not know which occurrence of -the segment is being processed. - The NV_PPBDMA_GP_ENTRY_LEVEL field specifies whether progress through the -GP entry's PB segment should be indicated in NV_RAMUSERD_TOP_LEVEL_GET. If this -field is LEVEL_MAIN, then progress through the PB segment will be reported -- -NV_RAMUSERD_TOP_LEVEL_GET will equal NV_RAMUSERD_GET. If this field is -LEVEL_SUBROUTINE, then progress through this PB segment is not reported -- Host -will not alter NV_RAMUSERD_TOP_LEVEL_GET. If this field is LEVEL_SUBROUTINE, -reads of NV_RAMUSERD_TOP_LEVEL_GET will return the last value of NV_RAMUSERD_GET -from a PB segment at LEVEL_MAIN. - - If the GP entry's opcode is OPCODE_ILLEGAL or an invalid opcode, Host will -initiate an interrupt (NV_PPBDMA_INTR_0_GPENTRY). If a GP entry specifies a PB -segment that crosses the end of the virtual address space (0xFFFFFFFFFF), then -Host will initiate an interrupt (NV_PPBDMA_INTR_0_GPENTRY). Invalid GP entries -are treated like traps: they will set the interrupt and freeze the PBDMA, but -the invalid GP entry is discarded. Once the interrupt is cleared, the PBDMA -unit will simply continue with the next GP entry. - Note a corner case exists where the PB segment described by a GP entry is -at the end of the virtual address space, or in other words, the last PB entry in -the described PB segment is the last dword in the virtual address space. This -type of GP entry is not valid and will generate a GPENTRY interrupt. The -PBDMA's PUT pointer describes the address of the first dword beyond the PB -segment, thus making the last dword in the virtual address space unusable for -storing a pbentry. - - - -#define NV_PPBDMA_GP_ENTRY__SIZE 8 /* */ - -#define NV_PPBDMA_GP_ENTRY0 0x10000000 /* RW-4R */ - -#define NV_PPBDMA_GP_ENTRY0_OPERAND 31:0 /* RWXUF */ -#define NV_PPBDMA_GP_ENTRY0_FETCH 0:0 /* */ -#define NV_PPBDMA_GP_ENTRY0_FETCH_UNCONDITIONAL 0x00000000 /* */ -#define NV_PPBDMA_GP_ENTRY0_FETCH_CONDITIONAL 0x00000001 /* */ -#define NV_PPBDMA_GP_ENTRY0_GET 31:2 /* */ - -#define NV_PPBDMA_GP_ENTRY1 0x10000004 /* RW-4R */ - -#define NV_PPBDMA_GP_ENTRY1_GET_HI 7:0 /* RWXUF */ - - -#define NV_PPBDMA_GP_ENTRY1_LEVEL 9:9 /* RWXUF */ -#define NV_PPBDMA_GP_ENTRY1_LEVEL_MAIN 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_LEVEL_SUBROUTINE 0x00000001 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_LENGTH 30:10 /* RWXUF */ -#define NV_PPBDMA_GP_ENTRY1_LENGTH_CONTROL 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_SYNC 31:31 /* RWXUF */ -#define NV_PPBDMA_GP_ENTRY1_SYNC_PROCEED 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_SYNC_WAIT 0x00000001 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_OPCODE 7:0 /* RWXUF */ -#define NV_PPBDMA_GP_ENTRY1_OPCODE_NOP 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_OPCODE_ILLEGAL 0x00000001 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_OPCODE_GP_CRC 0x00000002 /* RW--V */ -#define NV_PPBDMA_GP_ENTRY1_OPCODE_PB_CRC 0x00000003 /* RW--V */ - - - - - -Number of NOPs for self-modifying gpfifo - -This is a formula for SW to estimate the number of NOPs needed to pad the gpfifo -such that the modification of a gp entry by the engine or by the CPU can take -effect. Here, NV_PFIFO_LB_GPBUF_CONTROL_SIZE(eng) refers to the SIZE field in the -NV_PFIFO_LB_GPBUF_CONTROL(eng) register.(More info about the register in dev_fifo.ref) - -NUM_GP_NOPS(eng) = ((NV_PFIFO_LB_GPBUF_CONTROL_SIZE(eng)+1) * NV_PFIFO_LB_ENTRY_SIZE)/ NV_PPBDMA_GP_ENTRY__SIZE - - - - - -GP_BASE - Base and Limit of the Circular Buffer of GP Entries - - GP entries are stored in a buffer in memory. The NV_PPBDMA_GP_BASE_OFFSET -and NV_PPBDMA_GP_BASE_HI_OFFSET fields specify the 37-bit address in 8-byte -granularity of the start of a circular buffer that contains GP entries (GPFIFO). -This address is a virtual (not a physical) address. GP entries are always -GP_ENTRY__SIZE-byte aligned, so the least significant three bits of the byte -address are not stored. The byte address of the GPFIFO base pointer is thus: - - gpfifo_base_ptr = GP_BASE + (GP_BASE_HI_OFFSET << 32) - - The number of GP entries in the circular buffer is always a power of 2. -The NV_PPBDMA_GP_BASE_HI_LIMIT2 field specifies the number of bits used to count -the memory allocated to the GP FIFO. The LIMIT2 value specified in these -registers is Log base 2 of the number of entries in the GP FIFO. For example, -if the number of entries is 2^16--indicating a memory area of -(2^16)*GP_ENTRY__SIZE bytes--then the value written in LIMIT2 is 16. - The circular buffer containing GP entries cannot cross the maximum address. -If OFFSET + (1< 0xFFFFFFFFFF, then Host will -initiate a CPU interrupt (NV_PPBDMA_INTR_0_GPFIFO). - The NV_PPBDMA_GP_PUT, NV_PPBDMA_GP_GET, and NV_PPBDMA_GP_FETCH registers -(and their associated NV_RAMFC and NV_RAMUSERD entries) are relative to the -value of this register. - These registers are part of a GPU context's state. On a switch, the values -of these registers are saved to, and restored from, the NV_RAMFC_GP_BASE and -NV_RAMFC_GP_BASE_HI entries in the RAMFC part of the GPU context's GPU-instance -block. - Typically, software initializes the information in NV_RAMFC_GP_BASE and -NV_RAMFC_GP_BASE_HI when the GPU context's GPU-instance block is first created. -These registers are available to software only for debug. Software should use -them only if the GPU context is assigned to a PBDMA unit and that PBDMA unit is -stalled. While a GPU context's Host context is not contained within a PBDMA -unit, software should use the RAMFC entries to access this information. - A pair of these registers exists for each of Host's PBDMA units. These -registers run on Host's internal bus clock. - - -#define NV_PPBDMA_GP_BASE(i) (0x00040048+(i)*8192) /* RW-4A */ -#define NV_PPBDMA_GP_BASE__SIZE_1 14 /* */ - -#define NV_PPBDMA_GP_BASE_OFFSET 31:3 /* RW-UF */ -#define NV_PPBDMA_GP_BASE_OFFSET_ZERO 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_BASE_RSVD 2:0 /* RW-UF */ -#define NV_PPBDMA_GP_BASE_RSVD_ZERO 0x00000000 /* RW--V */ - -#define NV_PPBDMA_GP_BASE_HI(i) (0x0004004c+(i)*8192) /* RW-4A */ -#define NV_PPBDMA_GP_BASE_HI__SIZE_1 14 /* */ - -#define NV_PPBDMA_GP_BASE_HI_OFFSET 7:0 /* RW-UF */ -#define NV_PPBDMA_GP_BASE_HI_OFFSET_ZERO 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_BASE_HI_LIMIT2 20:16 /* RW-UF */ -#define NV_PPBDMA_GP_BASE_HI_LIMIT2_ZERO 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_BASE_HI_RSVDA 15:8 /* RW-UF */ -#define NV_PPBDMA_GP_BASE_HI_RSVDA_ZERO 0x00000000 /* RW--V */ -#define NV_PPBDMA_GP_BASE_HI_RSVDB 31:21 /* RW-UF */ -#define NV_PPBDMA_GP_BASE_HI_RSVDB_ZERO 0x00000000 /* RW--V */ - - -GP_FETCH - Pointer to the next GP-Entry to be Fetched - - Host does not fetch all GP entries with a single request to the memory -subsystem. Host fetches GP entries in batches. The NV_PPBDMA_GP_FETCH register -indicates index of the next GP entry to be fetched by Host. The actual 40-bit -virtual address of the specified GP entry is computed as follows: - fetch address = GP_FETCH_ENTRY * NV_PPBDMA_GP_ENTRY__SIZE + GP_BASE - If NV_PPBDMA_GP_PUT==NV_PPBDMA_GP_FETCH, then requests to fetch the entire -GP circular buffer have been issued, and Host cannot make more requests until -NV_PPBDMA_GP_PUT is changed. Host may finish fetching GP entries long before it -has finished processing the PB segments specified by those entries. -Software should not use NV_PPBDMA_GP_FETCH (it should use NV_PPBDMA_GP_GET), to -determine whether the GP circular buffer is full. NV_PPBDMA_GP_FETCH represents -the current extent of prefetching of GP entries; prefetched entries may be -discarded and refetched later. - This register is part of a GPU context's state. On a switch, the value of -this register is saved to, and restored from, the NV_RAMFC_GP_FETCH entry of -the RAMFC part of the GPU context's GPU-instance block. - A PBDMA unit maintains this register. Typically, software does not need to -access this register. This register is available to software only for debug. -Because Host may fetch GP entries long before it is ready to process the -entries, and because Host may discard GP entries that it has fetched, software -should not use NV_PPBDMA_GP_FETCH to monitor Host's progress (software should -use NV_PPBDMA_GP_GET for monitoring). Software should use this register only if -the GPU context is assigned to a PBDMA unit and that PBDMA unit is stalled. -While a GPU context's Host context is not contained within a PBDMA unit, -software should use NV_RAMFC_GP_FETCH to access this information. - If after a PRI write, or after this register has been restored from RAMFC -memory, the value equals or exceeds the size of the circular buffer that stores -GP entries (1<= PV), -where SV is the semaphore value in memory, PV is the payload value, and >= is -an unsigned greater-than-or-equal-to comparison. - If OPERATION is ACQ_CIRC_GEQ, the acquire succeeds when the two's -complement signed representation of the semaphore value minus the payload value -is non-negative; that is, when the semaphore value is within half a range -greater than or equal to the payload value, modulo that range. The -PAYLOAD_SIZE field determines if Host is doing a 32 bit comparison or a 64 bit -comparison. So in other words, the condition is met when the PAYLOAD_SIZE is -32BIT and the semaphore value is within the range [payload, -((payload+(2^(32-1)))-1)], modulo 2^32, or when the PAYLOAD_SIZE is 64BIT and -the semaphore value is within the range [payload, ((payload+(2^(64-1)))-1)], -modulo 2^64. - If OPERATION is ACQ_AND, the acquire succeeds when the bitwise-AND of the -semaphore value and the payload value is not zero. The PAYLOAD_SIZE field -determines if a 32 bit or 64 bit value is read from memory, and compared to. - If OPERATION is ACQ_NOR, the acquire succeeds when the bitwise-NOR of the -semaphore value and the payload value is not zero. PAYLOAD_SIZE determines if -a 32 bit or 64 bit value is read from memory, and compared to. - If OPERATION is RELEASE, then Host simply writes the payload value to the -semaphore structure in memory at the SEM_ADDR_LO/_HI address. The exact value -written depends on the operation defined. If PAYLOAD_SIZE is 32BIT then a 32 -bit payload value from PAYLOAD_LO is used. If PAYLOAD_SIZE is 64BIT then a 64 -bit payload specified by PAYLOAD_LO/_HI is used. - If OPERATION is REDUCTION, then Host sends the memory system an -instruction to perform the atomic reduction operation specified in the -REDUCTION field on the memory value, using the PAYLOAD_LO/_HI payload value as -the operand. The OPERATION_PAYLOAD_SIZE field determines if a 32 bit or 64 bit -reduction is performed. Note that if the semaphore address refers to a page -whose PTE has ATOMIC_DISABLE set, the operation will result in an -ATOMIC_VIOLATION fault; - Note that if the PAYLOAD_SIZE is 64BIT, the semaphore address is required -to be 8-byte aligned. If RELEASE_TIMESTAMP is EN while the operation is a -RELEASE or REDUCTION operation, the semaphore address is required to be 16-byte -aligned. The semaphore address is not required to be 16-byte aligned during an -acquire operation. If the semaphore address is not aligned according to the -field values Host will raise the NV_PPBDMA_INTR_0 interrupt. - For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, -the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. -Timestamping should also be disabled to avoid unwanted behavior. - -Semaphore switch option: - - The NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG field specifies whether or not -Host should switch to processing another TSG if the acquire fails. If every -channel within the same TSG has no work (is waiting on a semaphore acquire, is -idle, is unbound, or is disabled), the TSG can make no further progress until -one of the relevant semaphores is released. Because it may be a long time -before the release, it may be more efficient for the PBDMA unit to switch off -the blocked TSG prior to the runqueue timeslice expiring, so that it can serve -a different TSG that is not waiting, or so that it can poll other semaphores on -other TSGs whose channels are waiting on acquires. - When a semaphore acquire fails, the PBDMA unit will always switch to -another channel within the same TSG, provided that it has not completed a -traversal through all the TSG's channels. If every pending channel in the TSG -is waiting on a semaphore acquire, the Host scheduler is able identify a lack -of progress for the entire TSG by the time it has completed a traversal through -all those channels. In this case the value of ACQUIRE_SWITCH_TSG for each of -these channels determines whether the PBDMA will switch to another TSG or start -another traversal through the same TSG. - If ACQUIRE_SWITCH_TSG is DIS for any of the channels in the TSG, the Host -scheduler will ignore any lack of progress and continue processing the TSG, -until either every channel in the TSG runs out of work or the timeslice -expires. If ACQUIRE_SWITCH_TSG is EN for every pending channel in the TSG, the -Host scheduler will recognize a lack of progress for the whole TSG, and will -switch to the next serviceable TSG on the runqueue, if possible. - In the case described above, if there isn't a different serviceable TSG -on the runlist, then the current channel's TSG will continue to be scheduled -and the acquire retry will be naturally delayed by the time it takes for Host's -runlist processing to return to the same channel. This retry delay may be too -short, in which case the runlist search can be throttled to increase the delay -by configuring NV_PFIFO_ACQ_PRETEST; see dev_fifo.ref. Note that if the -channel remains switched in, the prefetched pushbuffer data is not discarded, -so setting ACQUIRE_SWITCH_TSG_EN cannot deterministically be depended on to -cause the discarding of prefetched pushbuffer data. - Also note that when switching between channels within a TSG, Host does not -wait on any timer (such as NV_PFIFO_ACQ_PRETEST or NV_PPBDMA_ACQUIRE_RETRY), -but is instead throttled by the time it takes to switch channels. Host will -honor the ACQUIRE_RETRY time, but only if the same channel is rescheduled -without a channel switch. - -Semaphore wait-for-idle option: - - The NV_UDMA_SEM_EXECUTE_RELEASE_WFI field applies only to releases and -reductions. It specifies whether Host should wait until the engine to which -the channel last sent methods is idle (in other words, until all previous -methods in the channel have been completed) before writing to memory as part of -the release or reduction operation. If this field is RELEASE_WFI_EN, then Host -waits for the engine to be idle, inserts a system memory barrier, and then -updates the value in memory. If this field is RELEASE_WFI_DIS, Host performs -the semaphore operation on the memory without waiting for the engine to be -idle, and without using a system memory barrier. - -Semaphore timestamp option: - - The NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP specifies whether a timestamp -should be written by a release in addition to the payload. If -RELEASE_TIMESTAMP is DIS, then only the semaphore payload will be written. If -the field is EN then both the semaphore payload and a nanosecond timestamp will -be written. In this case, the semaphore address must be 16-byte aligned; see -the related note at NV_UDMA_SEM_ADDR_LO. If RELEASE_TIMESTAMP is EN and -SEM_ADDR_LO is not 16-byte aligned, then Host will initiate an interrupt -(NV_PPBDMA_INTR_0_SEMAPHORE). When a 16-byte semaphore is written, the -semaphore timestamp will be written before the semaphore payload so that when -an acquire succeeds, the timestamp write will have completed. This ensures SW -will not get an out-of-date timestamp on platforms which guarantee ordering -within a 16-byte aligned region. The timestamp value is snapped from the -NV_PTIMER_TIME_1/0 registers; see dev_timer.ref. - For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, -the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. -Timestamping should also be disabled for a synpoint backed releast to avoid -unexpected behavior. - - Below is the little endian format of 16-byte semaphores in memory: - - ---- ------------------- ------------------- - byte Data(Little endian) Data(Little endian) - PAYLOAD_SIZE=32BIT PAYLOAD_SIZE=64BIT - ---- ------------------- ------------------- - 0 Payload[ 7: 0] Payload[ 7: 0] - 1 Payload[15: 8] Payload[15: 8] - 2 Payload[23:16] Payload[23:16] - 3 Payload[31:24] Payload[31:24] - 4 0 Payload[39:32] - 5 0 Payload[47:40] - 6 0 Payload[55:48] - 7 0 Payload[63:56] - 8 timer[ 7: 0] timer[ 7: 0] - 9 timer[15: 8] timer[15: 8] - 10 timer[23:16] timer[23:16] - 11 timer[31:24] timer[31:24] - 12 timer[39:32] timer[39:32] - 13 timer[47:40] timer[47:40] - 14 timer[55:48] timer[55:48] - 15 timer[63:56] timer[63:56] - ---- ------------------- ------------------- - - -Semaphore reduction operations: - - The NV_UDMA_SEM_EXECUTE_REDUCTION field specifies the reduction operation -to perform on the semaphore memory value, using the semaphore payload from -SEM_PAYLOAD_LO/HI as an operand, when the OPERATION field is -OPERATION_REDUCTION. Based on the PAYLOAD_SIZE field the semaphore value and -the payload are interpreted as 32bit or 64bit integers and the reduction -operation is performed according to the signedness specified via the -REDUCTION_FORMAT field described below. The reduction operation leaves the -modified value in the semaphore memory according to the operation as follows: - -REDUCTION_IMIN - the minimum of the value and payload -REDUCTION_IMAX - the maximum of the value and payload -REDUCTION_IXOR - the bitwise exclusive or (XOR) of the value and payload -REDUCTION_IAND - the bitwise AND of the value and payload -REDUCTION_IOR - bitwise OR of the value and payload -REDUCTION_IADD - the sum of the value and payload -REDUCTION_INC - the value incremented by 1, or reset to 0 if the incremented - value would exceed the payload -REDUCTION_DEC - the value decremented by 1, or reset back to the payload - if the original value is already 0 or exceeds the payload - -Note that INC and DEC are somewhat surprising: they can be used to repeatedly -loop the semaphore value when performed successively with the same payload p. -INC repeatedly iterates from 0 to p inclusive, resetting to 0 once exceeding p. -DEC repeatedly iterates down from p to 0 inclusive, resetting back to p once -the value would otherwise underflow. Therefore, an INC or DEC reduction with -payload 0 effectively releases a semaphore by setting its value to 0. - -The reduction opcode assignment matches the enumeration in the XBAR translator -(to avoid extra remapping of hardware), but this does not match the graphics FE -reduction opcodes used by graphics backend semaphores. The reduction operation -itself is performed by L2. - -Semaphore signedness option: - - The NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT field specifies whether the -values involved in a reduction operation will be interpreted as signed or -unsigned. - -The following table summarizes each reduction operation, and the signedness and -payload size supported for each operation: - - signedness - r op 32b 64b function (v = memory value, p = semaphore payload) - -----+-----+-----+--------------------------------------------------- - IMIN U,S U,S v = (v < p) ? v : p - IMAX U,S U,S v = (v > p) ? v : p - IXOR N/A N/A v = v ^ p - IAND N/A N/A v = v & p - IOR N/A N/A v = v | p - IADD U,S U v = v + p - INC U inv v = (v >= p) ? 0 : v + 1 - DEC U inv v = (v == 0 || v > p) ? p : v - 1 (from L2 IAS) - -An operation with signedness "N/A" will ignore the value of REDUCTION_FORMAT -when executing, and either value of REDUCTION_FORMAT is valid. If an operation -is "U only" this means a signed version of this operation is not supported, and -if it is marked "inv" then it is unsupported for any signedness. If Host sees -an unsupported reduction op (in other words, is expected to run a reduction op -while PAYLOAD_SIZE and REDUCTION_FORMAT are set to unsupported values for that -op), Host will raise the NV_PPBDMA_INTR_0_SEMAPHORE interrupt. - -Example: A signed 32-bit IADD reduction operation is valid. A signed 64-bit -IADD reduction operation is unsupported and will trigger an interrupt if sent to -Host. A 64-bit INC (or DEC) operation is not supported and will trigger an -interrupt if sent to Host. - -Legal semaphore operation combinations: - - For iGPU cases where a semaphore release can be mapped to an onchip syncpoint, -the SIZE must be 4Bytes to avoid double incrementing the target syncpoint. -Timestamping should also be disabled for a synpoint backed release to avoid -unexpected behavior. - - The following table diagrams the types of semaphore operations that are -possible. In the columns, "x" matches any field value. ACQ refers to any of -the ACQUIRE, ACQ_STRICT_GEQ, ACQ_CIRC_GEQ, ACQ_AND, and ACQ_NOR operations. -REL refers to either a RELEASE or a REDUCTION operation. - - OP SWITCH WFI PAYLOAD_SIZE TIMESTAMP Description - --- ------ --- ------------ --------- -------------------------------------------------------------- - ACQ 0 x 0 x acquire; 4B (32 bit comparison); retry on fail - ACQ 0 x 1 x acquire; 8B (64 bit comparison); retry on fail - ACQ 1 x 0 x acquire; 4B (32 bit comparison); switch on fail - ACQ 1 x 1 x acquire; 8B (64 bit comparison); switch on fail - REL x 0 0 1 WFI & release 4B payload + timestamp semaphore - REL x 0 1 1 WFI & release 8B payload + timestamp semaphore - REL x 1 0 1 do not WFI & release 4B payload + timestamp semaphore - REL x 1 1 1 do not WFI & release 8B payload + timestamp semaphore - REL x 0 0 0 WFI & release doubleword (4B) semaphore payload - REL x 0 1 0 WFI & release quadword (8B) semaphore payload - REL x 1 0 0 do not WFI & release doubleword (4B) semaphore payload - REL x 1 1 0 do not WFI & release quadword (8B) semaphore payload - --- ------ --- ------------ --------- -------------------------------------------------------------- - - While the channel is loaded on a PBDMA unit, information from this method -is stored in the NV_PPBDMA_SEM_EXECUTE register. Otherwise, this information -is stored in the NV_RAMFC_SEM_EXECUTE field of the RAMFC part of the channel's -instance block. - -Undefined bits: - - Bits in the NV_UDMA_SEM_EXECUTE method data that are not used by the -specified OPERATION should be set to 0. When non-zero, their behavior is -undefined. - - - -#define NV_UDMA_SEM_EXECUTE 0x0000006C /* -W-4R */ - -#define NV_UDMA_SEM_EXECUTE_OPERATION 2:0 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQUIRE 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_RELEASE 0x00000001 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_STRICT_GEQ 0x00000002 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_CIRC_GEQ 0x00000003 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_AND 0x00000004 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_ACQ_NOR 0x00000005 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_OPERATION_REDUCTION 0x00000006 /* -W--V */ - -#define NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG 12:12 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_DIS 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_ACQUIRE_SWITCH_TSG_EN 0x00000001 /* -W--V */ - -#define NV_UDMA_SEM_EXECUTE_RELEASE_WFI 20:20 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_RELEASE_WFI_DIS 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_RELEASE_WFI_EN 0x00000001 /* -W--V */ - -#define NV_UDMA_SEM_EXECUTE_PAYLOAD_SIZE 24:24 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_PAYLOAD_SIZE_32BIT 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_PAYLOAD_SIZE_64BIT 0x00000001 /* -W--V */ - -#define NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP 25:25 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP_DIS 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_RELEASE_TIMESTAMP_EN 0x00000001 /* -W--V */ - -#define NV_UDMA_SEM_EXECUTE_REDUCTION 30:27 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_IMIN 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_IMAX 0x00000001 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_IXOR 0x00000002 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_IAND 0x00000003 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_IOR 0x00000004 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_IADD 0x00000005 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_INC 0x00000006 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_DEC 0x00000007 /* -W--V */ - -#define NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT 31:31 /* -W-VF */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT_SIGNED 0x00000000 /* -W--V */ -#define NV_UDMA_SEM_EXECUTE_REDUCTION_FORMAT_UNSIGNED 0x00000001 /* -W--V */ - - -NON_STALL_INT [method] - Non-Stalling Interrupt Method - - The NON_STALL_INT method causes the NV_PFIFO_INTR_0_CHANNEL_INTR field -to be set to PENDING in the channel's interrupt register, as well as -NV_PFIFO_INTR_HIER_* registers. This will cause an interrupt if it is -enabled. Host does not stall the execution of the GPU context's -method, does not switch out the GPU context, and does not disable switching the -GPU context. - A NON_STALL_INT method's data (NV_UDMA_NON_STALL_INT_HANDLE) is ignored. - Software should handle all of a channel's non-stalling interrupts before it -unbinds the channel from the GPU context. - - -#define NV_UDMA_NON_STALL_INT 0x00000020 /* -W-4R */ - -#define NV_UDMA_NON_STALL_INT_HANDLE 31:0 /* -W-VF */ - - - - -MEM_OP methods: membars, and cache and TLB management. - - MEM_OP_A, MEM_OP_B, and MEM_OP_C set up state for performing a memory -operation. MEM_OP_D sets additional state, specifies the type of memory -operation to perform, and triggers sending the mem op to HUB. To avoid -unexpected behavior for future revisions of the MEM_OP methods, all 4 methods -should be sent for each requested mem op, with irrelevant fields set to 0. -Note that hardware does not enforce the requirement that unrelated fields be set -to 0, but ignoring this advice could break forward compatibility. - Host does not wait until an engine is idle before beginning to execute -this method. - While a GPU context is bound to a channel and assigned to a PBDMA unit, -the NV_UDMA_MEM_OP_A-C values are stored in the NV_PPBDMA_MEM_OP_A-C registers -respectively. While the GPU context is not assigned to a PBDMA unit, these -values are stored in the respective NV_RAMFC_MEM_OP_A-C fields of the RAMFC part -of the GPU context's instance block in memory. - -Usage, operations, and configuration: - - MEM_OP_D_OPERATION specifies the type of memory operation to perform. This -field determines the value of the opcode on the Host/FB interface. When Host -encounters the MEM_OP_D method, Host sends the specified request to the FB and -waits for an indication that the request has completed before beginning to -process the next method. To issue a memory operation, first issue the 3 -MEM_OP_A-C methods to configure the operation as documented below. Then send -MEM_OP_D to complete the configuration and trigger the operation. The -operations available for MEM_OP_D_OPERATION are as follows: - MEMBAR - perform a memory barrier; see below. - MMU_TLB_INVALIDATE - invalidate page translation and attribute data from -the given page directory that are cached in the Memory-Management Unit TLBs. - MMU_TLB_INVALIDATE_TARGETED - invalidate page translation and attributes -data corresponding to a specific page in a given page directory. - L2_SYSMEM_INVALIDATE - invalidate data from system memory cached in L2. - L2_PEERMEM_INVALIDATE - invalidate peer-to-peer data in the L2 cache. - L2_CLEAN_COMPTAGS - clean the L2 compression tag cache. - L2_FLUSH_DIRTY - flush dirty lines from L2. - L2_WAIT_FOR_SYS_PENDING_READS - ensure all sysmem reads are past the point -of being modified by a write through a reflected mapping. To do this, L2 drains -all sysmem reads to the point where they cannot be modified by future -non-blocking writes to reflected sysmem. L2 will block any new sysmem read -requests and drain out all read responses. Note VC's with sysmem read requests -at the head would stall any request till the flush is complete. The niso-nb vc -does not have sysmem read requests so it would continue to flow. L2 will ack -that the sys flush is complete and unblock all VC's. Note this operation is a -NOP on tegra chips. - ACCESS_COUNTER_CLR - clear page access counters. - - Depending on the operation given in MEM_OP_D_OPERATION, the other fields of -all four MEM_OP methods are interpreted differently: - -MMU_TLB_INVALIDATE* -------------------- - - When the operation is MMU_TLB_INVALIDATE or MMU_TLB_INVALIDATE_TARGETED, -then Host will initiate a TLB invalidate as described above. The MEM_OP -configuration fields specify what to invalidate, where to perform the -invalidate, and optionally trigger a replay or cancel event for replayable -faults buffered within the TLBs as part of UVM page management. - When the operation is MMU_TLB_INVALIDATE_TARGETED, -MEM_OP_C_TLB_INVALIDATE_PDB must be ONE, and the TLB_INVALIDATE_TARGET_ADDR_LO -and HI fields must be filled in to specify the target page. - These operations are privileged and can only be executed from channels -with NV_PPBDMA_CONFIG_AUTH_LEVEL set to PRIVILEGED. This is configured via the -NV_RAMFC_CONFIG dword in the channel's RAMFC during channel setup. - - MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_GPC_ID and -MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID identify the GPC and uTLB -within that GPC respectively that should perform the cancel operation when -MEM_OP_C_TLB_INVALIDATE_REPLAY is CANCEL_TARGETED. These field values should be -copied from the GPC_ID and CLIENT fields from the associated -NV_UVM_FAULT_BUF_ENTRY packet or NV_PFIFO_INTR_MMU_FAULT_INFO(i) entry. The -CLIENT_UNIT_ID corresponds to the values specified by NV_PFAULT_CLIENT_GPC_* in -dev_fault.ref. These fields are used with the CANCEL_TARGETED operation. The -fields also overlap with CANCEL_MMU_ENGINE_ID, and are interpreted as -CANCEL_MMU_ENGINE_ID during reply of type REPLAY_CANCEL_VA_GLOBAL. For other -replay operations, these fields must be 0. - - MEM_OP_A_TLB_INVALIDATE_CANCEL_MMU_ENGINE_ID specifies the associated -MMU_ENGINE_ID of the requests targeted by a REPLAY_CANCEL_VA_GLOBAL -operation. The field is ignored if the replay operation is not -REPLAY_CANCEL_VA_GLOBAL. This field overlaps with CANCEL_TARGET_GPC_ID and -CANCEL_TARGET_CLIENT_UNIT_ID field. - - MEM_OP_A_TLB_INVALIDATE_INVALIDATION_SIZE is aliased/repurposed - with MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID field - when MEM_OP_C_TLB_INVALIDATE_REPLAY (below) is anything other - than CANCEL_TARGETED or CANCEL_VA_GLOBAL or - CANCEL_VA_TARGETED. In the invalidation size enabled replay type - cases, actual region to be invalidated iscalculated as - 4K*(2^INVALIDATION_SIZE) i.e., - 4K*(2^CANCEL_TARGET_CLIENT_UNIT_ID); client unit id and gpc id - are not applicable. - - MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR controls whether a Hub SYSMEMBAR -operation is performed after waiting for all outstanding acks to complete, after -the TLB is invalidated. Note if ACK_TYPE is ACK_TYPE_NONE then this field is -ignored and no MEMBAR will be performed. This is provided as a SW optimization -so that SW does not need to perform a NV_UDMA_MEM_OP_D_OPERATION_MEMBAR op with -MEMBAR_TYPE SYS_MEMBAR after the TLB_INVALIDATE. This field must be 0 if -TLB_INVALIDATE_GPC is DISABLE. - - MEM_OP_B_TLB_INVALIDATE_TARGET_ADDR_HI:MEM_OP_A_TLB_INVALIDATE_TARGET_ADDR_LO -specifies the 4k aligned virtual address of the page whose translation to -invalidate within the TLBs. These fields are valid only when OPERATION is -MMU_TLB_INVALIDATE_TARGETED; otherwise, they must be set to 0. - - MEM_OP_C_TLB_INVALIDATE_PDB controls whether a TLB invalidate should apply -to a particular page directory or to all of them. If PDB is ALL, then all page -directories are invalidated. If PDB is ONE, then the PDB address and aperture -are specified in the PDB_ADDR_LO:PDB_ADDR_HI and PDB_APERTURE fields. -Note that ALL does not make sense when OPERATION is MMU_TLB_INVALIDATE_TARGETED; -the behavior in that case is undefined. - - MEM_OP_C_TLB_INVALIDATE_GPC controls whether the GPC-MMU and uTLB entries -should be invalidated in addition to the Hub-MMU TLB (Note: the Hub TLB is -always invalidated). Set it to INVALIDATE_GPC_ENABLE to invalidate the GPC TLBs. -The REPLAY, ACK_TYPE, and SYSMEMBAR fields are only used by the GPC TLB and so -are ignored if INVALIDATE_GPC is DISABLE. - - MEM_OP_C_TLB_INVALIDATE_REPLAY specifies the type of replay to perform in -addition to the invalidate. A replay causes all replayable faults outstanding -in the TLB to attempt their translations again. Once a TLB acks a replay, that -TLB may start accepting new translations again. The replay flavors are as -follows: - NONE - do not replay any replayable faults on invalidate. - START - initiate a replay across all TLBs, but don't wait for completion. - The replay will be acked as soon as the invalidate is processed, but - replays themselves are in flight and not necessarily translated. - START_ACK_ALL - initiate the replay and wait until it completes. - The replay will be acked after all pending transactions in the replay - fifo have been translated. New requests will remain stalled in the - gpcmmu until all transactions in the replay fifo have completed and - there are no pending faults left in the replay fifo. - CANCEL_TARGETED - initiate a cancel-replay on a targeted uTLB, causing any - replayable translations buffered in that uTLB to become non-replayable - if they fault again. In this case, the first faulting translation - will be reported in the NV_PFIFO_INTR_MMU_FAULT registers and will - raise PFIFO_INTR_0_MMU_FAULT. The specific TLB to target for the - cancel is specified in the CANCEL_TARGET fields. Note the TLB - invalidate still applies globally to all TLBs. - CANCEL_GLOBAL - like CANCEL_TARGETED, but all TLBs will cancel-replay. - CANCEL_VA_GLOBAL - initiates a cancel operation that cancels all requests - with the matching mmu_engine_id and access_type that land in the - specified 4KB aligned virtual address within the scope of specified - PDB. All other requests are replayed. If the specified engine is not - bound, or if the PDB of the specified engine does not match the - specified PDB, all requests will be replayed and none will be canceled. - - MEM_OP_C_TLB_INVALIDATE_ACK_TYPE controls which sort of ACK the uTLBs wait -for after having issued a membar to L2. ACK_TYPE_NONE does not perform any sort -of membar. ACK_TYPE_INTRANODE waits for an ack from the XBAR. -ACK_TYPE_GLOBALLY waits for an L2 ACK. ACK_TYPE_GLOBALLY is equivalent to a -MEMBAR operation from the engine, or a SYS_MEMBAR if -MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR is EN. - - MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL specifies which levels in the page -directory hierarchy of the TLB cache to invalidate. The levels are numbered -from the bottom up, with the PTE being at the bottom with level 1. The -specified level and all those below it in the hierarchy -- that is, all those -with a lower numbered level -- are invalidated. ALL (the 0 default) is -special-cased to indicate the top level; this causes the invalidate to apply to -the entire page mapping structure. The field is ignored if the replay operation -is REPLAY_CANCEL_VA_GLOBAL. - - MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE specifies the associated ACCESS_TYPE of -the requests targeted by a REPLAY_CANCEL_VA_GLOBAL operation. This field -overlaps with the INVALIDATE_PAGE_TABLE_LEVEL field, and is ignored if the -replay operation is not REPLAY_CANCEL_VA_GLOBAL. The ACCESS_TYPE field can get -one of the following values: - READ - the cancel_va_global should be performed on all pending read requests. - WRITE - the cancel_va_global should be performed on all pending write requests. - ATOMIC_STRONG - the cancel_va_global should be performed on all pending - strong atomic requests. - ATOMIC_WEAK - the cancel_va_global should be performed on all pending - weak atomic requests. - ATOMIC_ALL - the cancel_va_global should be performed on all pending atomic - requests. - WRITE_AND_ATOMIC - the cancel_va_global should be performed on all pending - write and atomic requests. - ALL - the cancel_va_global should be performed on all pending requests. - - - MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE specifies the target aperture of the -page directory for which TLB entries should be invalidated. This field must be -0 when TLB_INVALIDATE_PDB is ALL. - - MEM_OP_C_TLB_INVALIDATE_PDB_ADDR_LO specifies the low 20 bits of the -4k-block-aligned PDB (base address of the page directory) when -TLB_INVALIDATE_PDB is ONE; otherwise this field must be 0. The PDB byte address -should be 4k aligned and right-shifted by 12 before being split and packed into -the ADDR fields. Note that the PDB_ADDR_LO field starts at bit 12, so it is -possible to set MEM_OP_C to the low 32 bits of the byte address, mask off the -low 12, and then or in the rest of the configuration fields. - - MEM_OP_D_TLB_INVALIDATE_PDB_ADDR_HI contains the high bits of the PDB when -TLB_INVALIDATE_PDB is ONE. Otherwise this field must be 0. - -UVM handling of replayable faults: - - The following example illustrates how TLB invalidate may be used by the -UVM driver: - 1. When the TLB invalidate completes, all memory accesses using the old - TLB entries prior to the invalidate will finish translation (but not - completion), and any new virtual accesses will trigger new - translations. The outstanding in-flight translations are allowed to - fault but will not indefinitely stall the invalidate. - 2. When the TLB invalidate completes, in-flight memory accesses using the - old physical translations may not yet be visible to other GPU clients - (such as CopyEngine) or to the CPU. Accesses coming from clients that - support recoverable faults (such as TEX and GCC) can be made visible by - requesting the MMU to perform a membar using the ACK_TYPE and SYSMEMBAR - fields. - a. If ACK_TYPE is NONE the SYSMEMBAR field is ignored and no membar - is performed. - b. If ACK_TYPE is INTRANODE the invalidate will wait until all - in-flight physical accesses using the old translations are visible - to XBAR clients on the blocking VC. - c. If ACK_TYPE is GLOBALLY the invalidate will wait until all - in-flight physical accesses using the old translations are at the - point of coherence in L2, meaning writes will be visible to all - other GPU clients and reads will not be mutable by them. - d. If the SYSMEMBAR field is set to EN then a Hub SYSMEMBAR will also - be performed following the ACK_TYPE membar. This is the equivalent - of performing a NV_UDMA_MEM_OP_C_MEMBAR_TYPE_SYS_MEMBAR. - 3. If fault replay was requested then all pending recoverable faults in - the TLB replay list will be retranslated. This includes all faults - discovered while the invalidate was pending. This replay may generate - more recoverable faults. - 4. If fault replay cancel was requested then another replay is attempted of - all pending replayable faults on the targeted TLB(s). If any of these - re-fault they are discarded (sticky NACK or ACK/TRAP sent back to the - client depending on the setting of NV_PGPC_PRI_MMU_DEBUG_CTRL). - - - -MEMBAR ------- - - When the operation is MEMBAR, Host will perform a memory barrier operation. -All other fields must be set to 0 except for MEM_OP_C_MEMBAR_TYPE. When -MEMBAR_TYPE is MEMBAR, then a memory barrier will be performed with respect to -other clients on the GPU. When it is SYS_MEMBAR, the memory barrier will also be -performed with respect to the CPU and peer GPUs. - - MEMBAR - This issues a MEMBAR operation following all reads, writes, and -atomics currently in flight from the PBDMA. The MEMBAR operation will push all -such accesses already in flight on the same VC as the PBDMA to a point of GPU -coherence before proceeding. After this operation is complete, reads from any -GPU client will see prior writes from this PBDMA, and writes from any GPU client -cannot modify the return data of earlier reads from this PBDMA. This is true -regardless of whether those accesses target vidmem, sysmem, or peer mem. - WARNING: This only guarantees accesses from the same VC as the PBDMA that -are already in flight are coherent. Accesses from clients such as SM or a -non-PBDMA engine need already be at some point of coherency before this -operation to be coherent. - - SYS_MEMBAR - This implies the MEMBAR type above but in addition to having -accesses reach coherence with all GPU clients, this further waits for accesses -to be coherent with respect to the CPU and peer GPUs as well. After this -operation is complete, reads from the CPU or peer GPUs will see prior writes -from this PBDMA, and writes from the CPU or peer GPUs cannot modify the return -data of earlier reads from this PBDMA (with the exception of CPU reflected -writes, which can modify earlier reads). Note SYS_MEMBAR is really only needed -to guarantee ordering with off-chip clients. For on-chip clients such as the -graphics engine or copy engine, accesses to sysmem will be coherent with just a -MEMBAR operation. SYS_MEMBAR provides the same function as -OPERATION_SYSMEMBAR_FLUSH on previous architectures. - WARNING: As described above, SYS_MEMBAR will not prevent CPU reflected -writes issued after the SYS_MEMBAR from clobbering the return data of reads -issued before the SYS_MEMBAR. To handle this case, the invalidate must be -followed with a separate L2_WAIT_FOR_SYS_PENDING_READS mem op. - - - -L2* ---- - - These values initiate a cache management operation -- see above. All other -fields must be 0; there are no configuration options. - - - - -The ACCESS_COUNTER_CLR operation --------------------------------- - When MEM_OP_D_OPERATION is ACCESS_COUNTER_CLR, Host will request to clear -the the page access counters. There are two types of access counters - MIMC and -MOMC. This operation can be issued to clear all counters of all types, all -counters of a specified type (MIMC or MOMC), or a specific counter indicated by -counter type, bank and notify tag. - This operation is privileged and can only be executed from channels with -NV_PPBDMA_CONFIG_AUTH_LEVEL set to PRIVILEGED. This is configured via the -NV_RAMFC_CONFIG dword in the channel's RAMFC during channel setup. - -The operation uses the following fields in the MEM_OP_* methods: -ACCESS_COUNTER_CLR_TYPE (TY) : type of the access counter clear - operation -ACCESS_COUNTER_CLR_TARGETED_TYPE (T) : type of the access counter for - targeted operation -ACCESS_COUNTER_CLR_TARGETED_NOTIFY_TAG : 20 bits notify tag of the access - counter for targeted operation -ACCESS_COUNTER_CLR_TARGETED_BANK : 4 bits bank number of the access - counter for targeted operation - - - - - -MEM_OP method field defines: - -MEM_OP_A [method] - Memory Operation Method 1/4 - see above for documentation - -#define NV_UDMA_MEM_OP_A 0x00000028 /* -W-4R */ - -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_CLIENT_UNIT_ID 5:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_INVALIDATION_SIZE 5:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_CANCEL_TARGET_GPC_ID 10:6 /* -W-VF */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_CANCEL_MMU_ENGINE_ID 6:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR 11:11 /* -W-VF */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR_EN 0x00000001 /* -W--V */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_SYSMEMBAR_DIS 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_A_TLB_INVALIDATE_TARGET_ADDR_LO 31:12 /* -W-VF */ - - -MEM_OP_B [method] - Memory Operation Method 2/4 - see above for documentation - -#define NV_UDMA_MEM_OP_B 0x0000002c /* -W-4R */ - -#define NV_UDMA_MEM_OP_B_TLB_INVALIDATE_TARGET_ADDR_HI 31:0 /* -W-VF */ - - -MEM_OP_C [method] - Memory Operation Method 3/4 - see above for documentation - -#define NV_UDMA_MEM_OP_C 0x00000030 /* -W-4R */ - -Membar configuration field. Note: overlaps MMU_TLB_INVALIDATE* config fields. -#define NV_UDMA_MEM_OP_C_MEMBAR_TYPE 2:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_MEMBAR_TYPE_SYS_MEMBAR 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_MEMBAR_TYPE_MEMBAR 0x00000001 /* -W--V */ -Invalidate TLB entries for ONE page directory base, or for ALL of them. -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB 0:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_ONE 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_ALL 0x00000001 /* -W--V */ -Invalidate GPC MMU TLB entries or not (Hub-MMU entries are always invalidated). -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_GPC 1:1 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_GPC_ENABLE 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_GPC_DISABLE 0x00000001 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY 4:2 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_NONE 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_START 0x00000001 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_START_ACK_ALL 0x00000002 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_TARGETED 0x00000003 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_GLOBAL 0x00000004 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_REPLAY_CANCEL_VA_GLOBAL 0x00000005 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE 6:5 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_NONE 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_GLOBALLY 0x00000001 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACK_TYPE_INTRANODE 0x00000002 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE 9:7 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_READ 0 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_WRITE 1 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_STRONG 2 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_RSVRVD 3 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_WEAK 4 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ATOMIC_ALL 5 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_WRITE_AND_ATOMIC 6 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_ACCESS_TYPE_VIRT_ALL 7 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL 9:7 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_ALL 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_PTE_ONLY 0x00000001 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE0 0x00000002 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE1 0x00000003 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE2 0x00000004 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE3 0x00000005 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE4 0x00000006 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PAGE_TABLE_LEVEL_UP_TO_PDE5 0x00000007 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE 11:10 /* -W-VF */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_VID_MEM 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_SYS_MEM_COHERENT 0x00000002 /* -W--V */ -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_APERTURE_SYS_MEM_NONCOHERENT 0x00000003 /* -W--V */ -Address[31:12] of page directory for which TLB entries should be invalidated. -#define NV_UDMA_MEM_OP_C_TLB_INVALIDATE_PDB_ADDR_LO 31:12 /* -W-VF */ - -#define NV_UDMA_MEM_OP_C_ACCESS_COUNTER_CLR_TARGETED_NOTIFY_TAG 19:0 /* -W-VF */ - -MEM_OP_D [method] - Memory Operation Method 4/4 - see above for documentation -(Must be preceded by MEM_OP_A-C.) - -#define NV_UDMA_MEM_OP_D 0x00000034 /* -W-4R */ - -Address[58:32] of page directory for which TLB entries should be invalidated. -#define NV_UDMA_MEM_OP_D_TLB_INVALIDATE_PDB_ADDR_HI 26:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_D_OPERATION 31:27 /* -W-VF */ -#define NV_UDMA_MEM_OP_D_OPERATION_MEMBAR 0x00000005 /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_MMU_TLB_INVALIDATE 0x00000009 /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_MMU_TLB_INVALIDATE_TARGETED 0x0000000a /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_L2_PEERMEM_INVALIDATE 0x0000000d /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_L2_SYSMEM_INVALIDATE 0x0000000e /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_L2_CLEAN_COMPTAGS 0x0000000f /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_L2_FLUSH_DIRTY 0x00000010 /* -W--V */ -#define NV_UDMA_MEM_OP_D_OPERATION_L2_WAIT_FOR_SYS_PENDING_READS 0x00000015 /* -W--V */ - -#define NV_UDMA_MEM_OP_D_OPERATION_ACCESS_COUNTER_CLR 0x00000016 /* -W--V */ - -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE 1:0 /* -W-VF */ -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_MIMC 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_MOMC 0x00000001 /* -W--V */ -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_ALL 0x00000002 /* -W--V */ -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TYPE_TARGETED 0x00000003 /* -W--V */ - -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE 2:2 /* -W-VF */ -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE_MIMC 0x00000000 /* -W--V */ -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_TYPE_MOMC 0x00000001 /* -W--V */ - -#define NV_UDMA_MEM_OP_D_ACCESS_COUNTER_CLR_TARGETED_BANK 6:3 /* -W-VF */ - - -SET_REF [method] - Set Reference Count Method - - The SET_REF method allows the user to set the reference count -(NV_PPBDMA_REF_CNT) to a value. The reference count may be monitored to track -Host's progress through the pushbuffer. Instead of monitoring -NV_RAMUSERD_TOP_LEVEL_GET, software may put into the method stream SET_REF -methods that set the reference count to ever increasing values, and then read -NV_RAMUSERD_REF to determine how far in the stream Host has gone. - Before the reference count value is altered, Host waits for the engine to -be idle (to have completed executing all earlier methods), issues a SysMemBar -flush, and waits for the flush to complete. - While the GPU context is bound to a channel and assigned to a PBDMA unit, -the reference count value is stored in the NV_PPBDMA_REF register. While the -GPU context is not assigned to a PBDMA unit, the reference count value is stored -in the NV_RAMFC_REF field of the RAMFC portion of the GPU context's GPU-instance -block. - - -#define NV_UDMA_SET_REF 0x00000050 /* -W-4R */ - -#define NV_UDMA_SET_REF_CNT 31:0 /* -W-VF */ - - - -CRC_CHECK [method] - Method-CRC Check Method - - When debugging a problem in a real chip, it may be useful to determine -whether a PBDMA unit has sent the proper methods toward the engine. The -CRC_CHECK method checks whether the cyclic redundancy check value -calculated over previous methods has an expected value. If the value in the -NV_PPBDMA_METHOD_CRC register is not equal to NV_UDMA_CRC_CHECK_VALUE, then -Host initiates an interrupt (NV_PPBDMA_INTR_0_METHODCRC) and stalls. After -each comparison, the NV_PPBDMA_METHOD_CRC register is cleared. - The IEEE 802.3 CRC-32 polynomial (0x04c11db7) is used to calculate CRC -values. The CRC is calculated over the method subchannel, method address, and -method data of methods sent to an engine. Host can set both single and dual -methods to engines. The CRC is calculated as if dual methods were sent as -two single methods. The CRC is calculated on the byte-stream in little-endian -order. - - -Pseudocode for CRC calculation is: - - static NVR_U32 table[256]; - void init() { - for (NVR_U32 i = 0; i < 256; i++) { // create crc value for every byte - NVR_U32 crc = i << 24; - for (int j = 0; j < 8; j++) { // for every bit in the byte - if (crc & 0x80000000) crc = (crc << 1) ^ 0x04c11db7 - else crc = (crc << 1); - } - table[i] = crc; - } - } - NVR_U32 new_crc(unsigned char byte, NVR_U32 old_crc) { - NVR_U32 crc_top_byte = old_crc >> 24; - crc_top_byte ^= byte; - NVR_U32 new_crc = (old_crc << 8) ^ table[crc_top_byte]; - return new_crc; - } - - This method is used for debug. - This method was added in Fermi. - - -#define NV_UDMA_CRC_CHECK 0x0000007c /* -W-4R */ - -#define NV_UDMA_CRC_CHECK_VALUE 31:0 /* -W-VF */ - - -YIELD [method] - Yield Method - - The YIELD method causes a channel to yield the remainder of its timeslice. -The method's OP field specifies whether the channels' PBDMA timeslice, the -channel's runlist timeslice, or no timeslice is yielded. - If YIELD_OP_RUNLIST_TIMESLICE, then Host will act as if the channel's -runlist or TSG timeslice expired. Host will exit the TSG and switch to the next -channel after the TSG on the runlist. If there is no such channel to switch to, -then YIELD_OP_RUNLIST_TIMESLICE will not cause a switch. - When the PBDMA executes a YIELD_OP_RUNLIST_TIMESLICE method, it guarantees -that it will not execute further methods from the same channel or TSG until the -channel is restarted by the scheduler. However, note that this does not yield -the engine timeslice; if the engine is preemptable, the context will continue -to run on the engine until the remainder of its timeslice expires before Host -will attempt to preempt it. Also if there is an outstanding ctx load either -due to ctx_reload or from the other PBDMA in the SCG case, then yielding won't -take place until the outstanding ctx load finishes or aborts due to a preempt. -When the ctx load does complete on the other PBDMA, it is possible for that -PBDMA to execute some small number of additional methods before the runlist -yield takes effect and that PBDMA halts work for its channel. - If NV_UDMA_YIELD_OP_TSG, and if the channel is part of a TSG, then Host -will switch to the next channel in the same TSG, and if the channel is not part -of the TSG then this will be treated similar to YIELD_OP_NOP. If there is only -one channel with work in the TSG, Host will simply reschedule the same channel -in the TSG. YIELD_OP_TSG does not cause the scheduler to leave the TSG. The TSG -timeslice (TSG timeslice is equivalent to runlist timeslice for TSGs) counter -continues to increment through the channel switch and does not restart after -executing the yield method. When the PBDMA executes a Yield method, it -guarantees that it will not execute the method following that Yield until the -channel is restarted by the scheduler. - YIELD_OP_NOP is simply a NOP. Neither timeslice is yielded. This was kept -for compatibility with existing tests; NV_UDMA_NOP is the preferred NOP, but -also see the universal NOP PB instruction. See the description of -NV_FIFO_DMA_NOP in the "FIFO_DMA" section of dev_ram.ref. - - If an unknown OP is specified, Host will raise an NV_PPBDMA_INTR_*_METHOD -interrupt. - - -#define NV_UDMA_YIELD 0x00000080 /* -W-4R */ - -#define NV_UDMA_YIELD_OP 1:0 /* -W-VF */ -#define NV_UDMA_YIELD_OP_NOP 0x00000000 /* -W--V */ -#define NV_UDMA_YIELD_OP_RUNLIST_TIMESLICE 0x00000002 /* -W--V */ -#define NV_UDMA_YIELD_OP_TSG 0x00000003 /* -W--V */ - - -WFI [method] - Wait-for-Idle Method - - The WFI (Wait-For-Idle) method will stall Host from processing any more -methods on the channel until the engine to which the channel last sent methods -is idle. Note that the subchannel encoded in the method header is ignored (as -it is for all Host-only methods) and does NOT specify which engine to idle. In -Kepler, this is only relevant on runlists that serve multiple engines -(specifically, the graphics runlist, which also serves GR COPY). - The WFI method has a single field SCOPE which specifies the level of WFI -the Host method performs. ALL waits for all work in the engine from the same -context to be idle across all classes and subchannels. CURRENT_VEID causes the -WFI to only apply to work from the same VEID as the current channel. Note for -engines that do not support VEIDs, CURRENT_VEID works identically to ALL. - Note that Host methods ignore the subchannel field in the method. A Host -WFI method always applies to the engine the channel last sent methods to. If a -WFI with ALL is specified and the channel last sent work to the GRCE, this will -only guarantee that GRCE has no work in progress. It is possible that the GR -context will have work in progress from other VEIDs, or even the current VEID if -the current channel targets GRCE and has never sent FE methods before. This -means that if SW wants to idle the graphics pipe for all VEIDs, SW must send a -method to GR immediately before the WFI method. A GR_NOP is sufficient. - Note also that even if the current NV_PPBDMA_TARGET is GRAPHICS and not -GRCE, there are cases where Host can trivially complete a WFI without sending -the NV_PMETHOD_HOST_WFI internal method to FE. This can happen when - -1. the runlist timeslices to a different TSG just before the WFI method, -2. the other TSG does a ctxsw request due to methods for FE, and -3. FECS reports non-preempted in the ctx ack, so CTX_RELOAD doesn't get set. - -In that case, when the channel switches back onto the PBDMA, the PBDMA rightly -concludes that there is no way the context could be non-idle for that channel, -and therefore filters out the WFI, even if the other PBDMA is sending work to -other VEIDs. As in the subchannel case, a GR_NOP preceding the WFI is -sufficient to ensure that a SCOPE_ALL_VEID WFI will be sent to FE regardless of -timeslicing as long as the NOP and the WFI are submitted as part of the same -GP_PUT update. This is ensured by the semantics of the channel state -SHOULD_SEND_HOST_TSG_EVENT behaving like CTX_RELOAD: the GR_NOP causes the PBDMA -to set the SHOULD_SEND_HOST_TSG_EVENT state, so even a channel or context switch -will still result in the PBDMA having the engine context loaded. Thus the WFI -will cause the HOST_WFI internal method to be sent to FE. - - -#define NV_UDMA_WFI 0x00000078 /* -W-4R */ - -#define NV_UDMA_WFI_SCOPE 0:0 /* -W-VF */ -#define NV_UDMA_WFI_SCOPE_CURRENT_VEID 0x00000000 /* -W--V */ -#define NV_UDMA_WFI_SCOPE_ALL 0x00000001 /* -W--V */ -#define NV_UDMA_WFI_SCOPE_ALL_VEID 0x00000001 /* */ - - - -CLEAR_FAULTED [method] - Clear Faulted Method - - The CLEAR_FAULTED method clears a channel's PCCSR PBDMA_FAULTED or -ENG_FAULTED bit. These bits are set by Host in response to a PBDMA fault or -engine fault respectively on the specified channel; see dev_fifo.ref. - - The CHID field specifies the ID of the channel whose FAULTED bit is to be -cleared. - - The TYPE field specifies which FAULTED bit is to be cleared: either -PBDMA_FAULTED or ENG_FAULTED. - - When Host receives a CLEAR_FAULTED method for a channel, the corresponding -PCCSR FAULTED bit for the channel should be set. However, due to a race between -SW seeing the fault message from MMU and handling the fault and sending the -CLEAR_FAULT method verses Host seeing the fault from CE or MMU and setting the -FAULTED bit, it is possible for the CLEAR_FAULTED method to arrive before the -FAULTED bit is set. Host will handle a CLEAR_FAULTED method according to the -following cases: - - a. The FAULTED bit specified by TYPE is set. Host will clear the bit and -retire the CLEAR_FAULTED method. - - b. If the bit is not set, the PBDMA will continue to retry the -CLEAR_FAULTED method on every PTIMER microsecond tick by rechecking the FAULTED -bit of the target channel. Once the bit is set, the PBDMA will clear the bit and -retire the method. The execution of the fault handling channel will stall on the -CLEAR_FAULTED method until the FAULTED bit for the target channel is set. The -PBDMA will retry the CLEAR_FAULTED method approximately every microsecond. - - c. If the fault handling channel's timeslice expires while stalled on a -CLEAR_FAULTED method, the channel will switch out. Once rescheduled, the -channel will resume retrying the CLEAR_FAULTED method. - - d. To avoid indefinitely waiting for the CLEAR_FAULTED method to retire -(likely due to wrongly injected CLEAR_FAULTED method due to a SW bug), Host -has a timeout mechanism to inform SW of a potential bug. This timeout is -controlled by NV_PFIFO_CLEAR_FAULTED_TIMEOUT; see dev_fifo.ref for details. - - e. When a CLEAR_FAULTED timeout is detected, Host will raise a stalling -interrupt by setting the NV_PPBDMA_INTR_0_CLEAR_FAULTED_ERROR field. The -address of the invalid CLEAR_FAULTED method will be in NV_PPBDMA_METHOD0, and -its payload will be in NV_PPBDMA_DATA0. - - Note Setting the timeout value too low could result in false stalling -interrupts to SW. The timeout should be set equal to NV_PFIFO_FB_TIMEOUT_PERIOD. - - Note the CLEAR_FAULTED timeout mechanism uses the same PBDMA registers and -RAMFC fields as the semaphore acquire timeout mechanism: -NV_PPBDMA_SEM_EXECUTE_ACQUIRE_FAIL is set TRUE when the first attempt fails, and -the NV_PPBDMA_ACQUIRE_DEADLINE is loaded with the sum of the current PTIMER and -the NV_PFIFO_CLEAR_FAULTED_TIMEOUT. The ACQUIRE_FAIL bit is reset to FALSE when -the CLEAR_FAULTED method times out or succeeds. - - -#define NV_UDMA_CLEAR_FAULTED 0x00000084 /* -W-4R */ - -#define NV_UDMA_CLEAR_FAULTED_CHID 11:0 /* -W-VF */ -#define NV_UDMA_CLEAR_FAULTED_TYPE 31:31 /* -W-VF */ -#define NV_UDMA_CLEAR_FAULTED_TYPE_PBDMA_FAULTED 0x00000000 /* -W--V */ -#define NV_UDMA_CLEAR_FAULTED_TYPE_ENG_FAULTED 0x00000001 /* -W--V */ - - - - Addresses that are not defined in this device are reserved. Those below -0x100 are reserved for future Host methods. Addresses 0x100 and beyond are -reserved for the engines served by Host. -- cgit v1.2.3