summaryrefslogtreecommitdiff
path: root/manuals/volta/gv100/dev_mmu_fault.ref.txt
blob: e4e62db55c8f342ecd9b251ff9271ec35bd90f74 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------


1  -  INTRODUCTION
==================

This manual contains information definition of replayable (UVM)
and non-replayable fault buffer packet in memory. Both type of faults
use same packet format in memory although separate fault buffers are
used.

The goal of the UVM feature is to have a single unified virtual memory space
for both GPU and CPU memory accesses.  In addition to the unified address
space, UVM allows the GPU driver to support demand paging, and seamlessly
migrate pages from GPU RAM to the primary system memory.

This is done by allowing page faults to be stalling and support replay, and by
reporting page faults to the operating system or GPU driver in an efficient
manner.

Non-replayable faults are various mapping and permission related
faults and are usually fatal.



2  -  GPU FAULT BUFFER
======================================
This chapter describes the format of the GPU replayable and
non-replayable fault reporting buffer used to report page faults.


This fault buffer is written to by GMMU based on buffer location info
set in GMMU registers (NV_PFB_PRI_MMU_REPLAY_FAULT_BUFFER_LO/HI and
NV_PFB_PRI_MMU_NON_REPLAY_FAULT_BUFFER_LO/HI). The replayable fault
buffer is managed by the UVM driver. The non-replayable fault buffer
is managed by RM.


The size of the fault buffer is controlled by SIZE register in GMMU
which can be programmed by SW. If SW does not want to program the SIZE
(because SW does not know/have enough info) then SW can write a SIZE
CTL bit (SET_DEFAULT) in GMMU register to set the size to a HW
recommended value. On that SIZE CTL bit write, GMMU will calculate the
recommended value based on chip size and so and write the recommended
value.


The buffer can overflow. There is status maintained in GMMU register
for that. If the buffer has overflowed the GPU will stop writing out new fault
entries and proceed to drop entries until SW resets the overflow
status (normally after processing the existing fault packets and so
GET PTR is changed). This is done to prevent the GPU from overwriting
unprocessed entries.  When faults are dropped they are not lost for
replayable faults as the requets are buffered in MMU replay buffer;
however, non-replayable faults are lost as those requests are not
buffered for further processing; when SW triggers a replay event the
requests for the dropped replayable faults will be replayed, fault
again, and then be reported in the fault buffer.

Each entry is of size NV_MMU_FAULT_BUFFER_PACKET_SIZE (=32) bytes and
contains the fault information necessary for (1) the UVM driver to
perform necessary page migrations and house keeping in response to a
replayable fault for replayable fault and (2) the RM to perform
graceful exit for the non-replayable fault.

The ENGINE_ID field specifies the faulting MMU engine id.

The APERTURE field specifies the GPU physical APERTURE of the instance block
used for the request.  VID_MEM indicates the instance block was stored in the
GPU devices RAM.  SYS_MEM_COHERENT indicates the instance block was stored in
coherent system memory. SYS_MEM_NONCOHERENT indicates the page table was stored
in non-coherent system memory.


INST_LO is used to specify bits 32:12 of the physical 4KB aligned instance
block associated with the faulting request.  INST_LO is aligned to this 4KB
boundary and so the bottom 12 bits are not reported in this data structure, and
this space is used to specify other fields.  The instance block contains the
pointers to the page table used for the memory request.




INST_HI contains the high order bits of the instance block associated with the
memory request.  Space is reserved to allow INST_HI to eventually expand up to
64 bits.

ADDR_LO and ADDR_HI specify the 4K-aligned address (virtual or
physical based on ACCESS_TYPE) of the faulting request. Up to 64 bits
4K-aligned address can be reported; however,the bit width of the
addresses supported by a given GPU depends on the GPU's family.

PHYS_APERTURE specifies the aperture of the faulting address.


FAULT_TYPE indicates the type of fault which occurred.  For a list of different
fault types please see the NV_PFAULT_FAULT_TYPE_* defines in dev_fault.ref.

REPLAYABLE_FAULT(RF) indicates whether this fault is a replayable fault or
not. This bit is set false when (1) the fault is non-replayable or (2)
fault is replayable but has been cancelled.

CLIENT indicates which MMU client generated the faulting request.

The ACCESS_TYPE field indicates the type of the faulting request.

MMU_CLIENT_TYPE indicates whether the faulting request originated in a GPC, or
if it came from another type of HUB client.  This field determines how the
CLIENT field should be interpreted.


GPC_ID specifies the GPC which generated the faulting request if MMU_CLIENT_TYPE
will be NV_PFAULT_MMU_CLIENT_TYPE_GPC, meaning the request came from a GPC
client.  Otherwise the GPC_ID field should be ignored.


REPLAYABLE_FAULT_EN (R) is set to true if replayable fault is enabled for
any client in the instance block.  It does not indicate whether the fault is replayable.

VALID (V) indicates that this current buffer entry is VALID.

#define NV_MMU_FAULT_BUF                                                    /* ----G */
#define NV_MMU_FAULT_BUF_ENTRY                              0x1F:0x00000000 /* RW--M */

Size of a buffer entry in bytes
#define NV_MMU_FAULT_BUF_SIZE              32 /*       */

#define NV_MMU_FAULT_BUF_ENTRY_INST_APERTURE              (9+0*32):(0*32+8) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_INST_APERTURE_VID_MEM             0x00000000 /* RW--V */
#define NV_MMU_FAULT_BUF_ENTRY_INST_APERTURE_SYS_MEM_COHERENT    0x00000002 /* RW--V */
#define NV_MMU_FAULT_BUF_ENTRY_INST_APERTURE_SYS_MEM_NONCOHERENT 0x00000003 /* RW--V */

#define NV_MMU_FAULT_BUF_ENTRY_INST_LO                   (31+0*32):(0*32+12) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_INST_HI                   (31+1*32):(1*32+0) /* RWXVF */
Dword-spanning field define alias
#define NV_MMU_FAULT_BUF_ENTRY_INST                      (31+1*32):(0*32+12) /*       */

#define NV_MMU_FAULT_BUF_ENTRY_ADDR_PHYS_APERTURE         (1+2*32):(2*32+0) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_ADDR_LO                   (31+2*32):(2*32+12) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_ADDR_HI                   (31+3*32):(3*32+0) /* RWXVF */
Dword-spanning field define alias
#define NV_MMU_FAULT_BUF_ENTRY_ADDR                      (31+3*32):(2*32+12) /*       */

#define NV_MMU_FAULT_BUF_ENTRY_TIMESTAMP_LO              (31+4*32):(4*32+0) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_TIMESTAMP_HI              (31+5*32):(5*32+0) /* RWXVF */
Dword-spanning field define alias
#define NV_MMU_FAULT_BUF_ENTRY_TIMESTAMP                 (31+5*32):(4*32+0) /*       */

#define NV_MMU_FAULT_BUF_ENTRY_ENGINE_ID                  (8+6*32):(6*32+0) /* RWXVF */

#define NV_MMU_FAULT_BUF_ENTRY_FAULT_TYPE                 (4+7*32):(7*32+0) /* RWXVF */

#define NV_MMU_FAULT_BUF_ENTRY_REPLAYABLE_FAULT           (7+7*32):(7*32+7) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_REPLAYABLE_FAULT_FALSE            0x00000000 /* RWX-V */
#define NV_MMU_FAULT_BUF_ENTRY_REPLAYABLE_FAULT_TRUE             0x00000001 /* RWX-V */

#define NV_MMU_FAULT_BUF_ENTRY_CLIENT                    (14+7*32):(7*32+8) /* RWXVF */

#define NV_MMU_FAULT_BUF_ENTRY_ACCESS_TYPE              (19+7*32):(7*32+16) /* RWXVF */

#define NV_MMU_FAULT_BUF_ENTRY_MMU_CLIENT_TYPE          (20+7*32):(7*32+20) /* RWXVF */

#define NV_MMU_FAULT_BUF_ENTRY_GPC_ID                   (28+7*32):(7*32+24) /* RWXVF */


#define NV_MMU_FAULT_BUF_ENTRY_REPLAYABLE_FAULT_EN      (30+7*32):(7*32+30) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_REPLAYABLE_FAULT_EN_FALSE         0x00000000 /* RWX-V */
#define NV_MMU_FAULT_BUF_ENTRY_REPLAYABLE_FAULT_EN_TRUE          0x00000001 /* RWX-V */

// NOTE: VALID must be in the last byte in the packet for proper write ordering
#define NV_MMU_FAULT_BUF_ENTRY_VALID                    (31+7*32):(7*32+31) /* RWXVF */
#define NV_MMU_FAULT_BUF_ENTRY_VALID_FALSE                       0x00000000 /* RWX-V */
#define NV_MMU_FAULT_BUF_ENTRY_VALID_TRUE                        0x00000001 /* RWX-V */


--------------------------------------------------------------------------------
                         KEY LEGEND
--------------------------------------------------------------------------------

Each define in the .ref file has a 5 field code to say what kind of define it is: i.e. /* RW--R */
The following legend shows accepted values for each of the 5 fields:
  Read, Write, Internal State, Declaration/Size, and Define Indicator.

  Read
    ' ' = Other Information
    '-' = Field is part of a write-only register
    'C' = Value read is always the same, constant value line follows (C)
    'R' = Value is read


  Write
    ' ' = Other Information
    '-' = Must not be written (D), value ignored when written (R,A,F)
    'W' = Can be written


  Internal State
    ' ' = Other Information
    '-' = No internal state
    'X' = Internal state, initial value is unknown
    'I' = Internal state, initial value is known and follows (I), see "Reset Signal" section for signal.
    'E' = Internal state, initial value is known and follows (E), see "Reset Signal" section for signal.
    'B' = Internal state, initial value is known and follows (B), see "Reset Signal" section for signal.
    'C' = Internal state, initial value is known and follows (C), see "Reset Signal" section for signal.

    'V' = (legacy) Internal state, initialize at volatile reset
    'D' = (legacy) Internal state, default initial value at object creation (legacy: Only used in dev_ram.ref)
    'C' = (legacy) Internal state, initial value at object creation
    'C' = (legacy) Internal state, class-based initial value at object creation (legacy: Only used in dev_ram.ref)


  Declaration/Size
    ' ' = Other Information
    '-' = Does Not Apply
    'V' = Type is void
    'U' = Type is unsigned integer
    'S' = Type is signed integer
    'F' = Type is IEEE floating point
    '1' = Byte size (008)
    '2' = Short size (016)
    '3' = Three byte size (024)
    '4' = Word size (032)
    '8' = Double size (064)


  Define Indicator
    ' ' = Other Information
    'C' = Clear value
    'D' = Device
    'L' = Logical device.
    'M' = Memory
    'R' = Register
    'A' = Array of Registers
    'F' = Field
    'V' = Value
    'T' = Task
    'P' = Phantom Register

    'B' = (legacy) Bundle address
    'G' = (legacy) General purpose configuration register
    'C' = (legacy) Class

  Reset signal defaults for graphics engine registers.
    All graphics engine registers use the following defaults for reset signals:
     'E' = initialized with engine_reset_
     'I' = initialized with context_reset_
     'B' = initialized with reset_IB_dly_

  Reset signal
    For units that differ from the graphics engine defaults, the reset signals should be defined here: