GPU/External Registers

From 3dbrew
< GPU
Revision as of 17:38, 18 April 2021 by MarcusD (talk | contribs) (Add 2DS info to "scanline double")
Jump to navigation Jump to search

This page describes the address range accessible from the ARM11, used to configure the basic GPU functionality. For information about the internal registers used for 3D rendering, see GPU/Internal Registers.

Map

Address mappings for the external registers. GSPGPU:WriteHWRegs takes these addresses relative to 0x1EB00000.

User VA PA Length Name Comments
0x1EF00004 0x10400004 4 ?
0x1EF00010 0x10400010 16 Memory Fill1 "PSC0" GX command 2
0x1EF00020 0x10400020 16 Memory Fill2 "PSC1" GX command 2
0x1EF00030 0x10400030 4 VRAM bank control Bits 8-11 = bank[i] disabled; other bits are unused
0x1EF00034 0x10400034 4 GPU Busy Bit31 = cmd-list busy, bit27 = PSC0 busy, bit26 = PSC1 busy.
0x1EF00050 0x10400050 4 ? Writes 0x22221200 on GPU init.
0x1EF00054 0x10400054 4 ? Writes 0xFF2 on GPU init.
0x1EF000C0 0x104000C0 4 Backlight control Writes 0x0 to allow backlights to turn off, 0x20000000 to force them always on.
0x1EF00400 0x10400400 0x100 Framebuffer Setup "PDC0" (top screen)
0x1EF00500 0x10400500 0x100 Framebuffer Setup "PDC1" (bottom)
0x1EF00C00 0x10400C00 ? Transfer Engine "DMA"
0x1EF01000/0x10401000 - 0x1EF01C00/0x10401C00 maps to GPU internal registers. These registers are usually not read/written directly here, but are written using the command list interface below (corresponding to the GPUREG_CMDBUF_* internal registers)
0x1EF01000 0x10401000 0x4 ? Writes 0 on GPU init and before the Command List is used
0x1EF01080 0x10401080 0x4 ? Writes 0x12345678 on GPU init.
0x1EF010C0 0x104010C0 0x4 ? Writes 0xFFFFFFF0 on GPU init.
0x1EF010D0 0x104010D0 0x4 ? Writes 1 on GPU init.
0x1EF014?? 0x104014?? 0x14 "PPF" ?
0x1EF018E0 0x104018E0 0x14 Command List "P3D"

Memory Fill

User VA Description
0x1EF000X0 Buffer start physaddr >> 3
0x1EF000X4 Buffer end physaddr >> 3
0x1EF000X8 Fill value
0x1EF000XC Control. bit0: start/busy, bit1: finished, bit8-9: fill-width (0=16bit, 1=3=24bit, 2=32bit)

Memory fills are used to initialize buffers in memory with a given value, similar to memset. A memory fill is triggered by setting bit0 in the control register. Doing so aborts any running memory fills on that filling unit. Upon completion, the hardware unsets bit0 and sets bit1 and fires interrupt PSC0.

These registers are used by GX SetMemoryFill.

LCD Source Framebuffer Setup

All of these registers must be accessed with 32bit operations regardless of the registers' actual bit size.

Offset Name Comments
0x00 H-total (V-total on not physically rotated screens). 12bits.

Setting this value too low will make the screen not be able to sync any pixels other than a single one from the wrong location. The lowest the screen can handle is 0x1C2, at 0x1C1 the display loses a few scanlines worth of pixel clock (though not noticable).

0x04 HBlank timer(?) Seems to determine the horizontal blanking interval.


Setting this to lower than HTotal - HDisp will make the screen not catch up with the scanlines, some will be skipped, some will be misaligned.

Setting this to higher than HTotal - HDisp will make the displayed image misaligned to the right.

Setting this to higher than HTotal seems to make the horizontal synchronization never happen.

0x08 ? must be >= REG#0x00
0x0C ? must be >= REG#0x08
0x10 Window X start (LgyFb) Offsets the viewing window on the display's physical X co-ordinate relative to its front porch end.

Outside of LgyFb changing this value only seems to cause weird pixel interpolation and blurryness.

0x14 Window X end (LgyFb) Window X start + window width - 1 is written here to set the end display scan pixel offset.

Outside of LgyFb it seems to offset the screen to the left if this value is high enough, but can glitch out the syncing on the bottom screen. High enough values will make the screen skip too many "pixels". If this value is higher or equal to *some value* (aka. if less than one pixel per line is displayed on the screen) then the screen will lose synchronization.

0x18 Window Y start (LgyFb) Offsets the viewing window on the display's physical Y co-ordinate relative to the first visible scanline.
0x20 Low: Window Y end (LgyFb)

High: ???

Low: Window Y start + window height - 1 is written here to set the last display scanline relative to the first visible scanline.

High: This is cleared to zero when displaying LgyFb. Outside of LgyFb this doesn't really seem to do anything useful.


??? extra pixels get inserted in the first displayed scanline and thus the image gets shifted to the right. Seems to make horizontal syncing a bit glitchy. If a HSync occurs, the pixel data is suspended until the first pixel is supposed to be displayed, then the pixel stream will continue where it left off until a delayed HSync gets processed relative to the pixel data.

0x24 V-total (H-total on not physically rotated screens). Total scanlines including porches/sync timing. Setting this to 494 for the topscreen lowers framerate to about 50.040660858 Hz.
0x28 VBlank timer(?) Seems to determine the vertical blanking interval.


Setting this to lower than VTotal - VDisp will cut off the top VTotal - VDisp - thisvalue lines.

Setting this to higher than VTotal - VDisp will make the image be pushed downwards with the overscan color visible.

Setting this to higher than HTotal will make the GPU skip vertical pixel data synchronization (hence filling the screen with the rest of the pixel data past the given screen framebuffer size). Also will skip thisvalue + somevalue - HTotal lines into the "global" pixel buffer.

0x30 VTotal Total amount of vertical scanlines in the pixel buffer, must be bigger than *an unknown blanking-like value*. If this value is less than VDisp then the last two scanlines will be repeated interlaced until VDisp is reached.
0x34 VDisp(?) Total amonut of vertical scanlines displayed (only for top screen it seems like). If this value is less than VTotal then the rest of the scanlines will not be updated on the screen, so those will slowly fade out. Must be bigger than *an unknown blanking-like value*, otherwise an underflow will happen.
0x38 Vertical data offset(?) ??? Seems to offset the screen upwards if this value is high enough. If this value is higher or equal to *some value* (aka. if less than one scanline is displayed on the screen) then the screen will lose synchronization.
0x44 ??? similar functionality to 0x10
0x48 ??? bit0 seems to disable HSync, bit8 seems to disable VSync, rest of the bits aren't writable.
0x4C Overscan filler color
0x50 Horizontal position counter read-only
0x54 Horizontal scanline (HBlank) counter read-only
0x5C ??? low u16: framebuffer width

high u16: framebuffer height??? (seems to be unused)

0x60 ??? low u16: timing data(?)

high u16: framebuffer total width (amount of pixels blitted regardless of framebuffer width)

0x64 ??? low u16: unknown

high u16: framebuffer total height (amount of scanlines blitted regardless of framebuffer height)

0x68 Framebuffer A first address For top screen, this is the left eye 3D framebuffer.
0x6C Framebuffer A second address For top screen, this is the left eye 3D framebuffer.
0x70 Framebuffer format Bit0-15: framebuffer format, bit16-31: unknown
0x74 PDC control Bit 0: Enable display controller.

Bit 8: H(Blank?) IRQ mask (0 = enabled). Bit 9: VBlank IRQ mask (0 = enabled). Bit 10: Error IRQ mask? (0 = enabled). Bit 16: Output enable?

0x78 Framebuffer select and status Bit 0: Next framebuffer to display (after VBlank).

Bit 4: Currently displaying framebuffer? Bit 8: Reset FIFO? Bit 16: H(Blank?) IRQ status/ack. Write 1 to aknowledge. Bit 17: VBlank IRQ status/ack. Bit 18: Error IRQ status/ack?

0x80 Color lookup table index select 8bits, write-only
0x84 Color lookup table indexed element Contains the value of the color lookup table indexed by the above register, 24bits, RGB8 (0x00BBGGRR)

Accessing this register will increase the index register by one

0x90 Framebuffer stride Distance in bytes between the start of two framebuffer rows (must be a multiple of 8).
0x94 Framebuffer B first address For top screen, this is the right eye 3D framebuffer. Unused for bottom screen.
0x98 Framebuffer B second address For top screen, this is the right eye 3D framebuffer. Unused for bottom screen.

Framebuffer format

Bit Description
2-0 Color format
3 ?
5-4 Framebuffer scanline output mode (interlace config)
0 - A  (output image as normal)
1 - AA (output a single line twice, aka framebuffer A is interlaced with itself)
2 - AB (interlace framebuffer A and framebuffer B)
3 - BA (same as above, but the line from framebuffer B is outputted first)

0 is used by bottom screen at all times. 1 is used by the top screen in 2D mode. 2 is used by top screen in 3D mode. 3 goes unused in userland.

6 Scan doubling enable?* (used by top screen)
7 ?
9-8 Value 1 = unknown: get rid of rainbow strip on top of screen, 3 = unknown: black screen.
15-10 Unused?
  • The weird thing about scan doubling, is that it works different between the bottom and top LCD. On the bottom LCD, it doubles the number of outputted pixels (so the same pixel is outputted twice, effectively doing column doubling). However on the top screen, it does scanline doubling instead. Considering that the bottom screen's table doesn't work on the top screen, this could give a hint as to how the top screen receives the pixel data from the PDC.

On a 2DS, it seems to have no effect on the top part of the display, and on the bottom screen it just shifts the framebuffer to the right two pixels.


GSP module only allows the LCD stereoscopy to be enabled when bit5=1 and bit6=0 here. When GSP module updates this register, GSP module will automatically disable the stereoscopy if those bits are not set for enabling stereoscopy.


When both interlacing and scan doubling are disabled, the full resolution of the top screen (240x800) can be utilized if the PDC registers are updated to accomodate this higher resolution. GSP contains tables for this mode (gsp mode == 1). GSP automatically applies this mode if both bit5 and bit6 are cleared. This is also the default, and the only valid mode for the bottom screen in userland.

If only AB interlacing is enabled, gsp detects this as a request to switch to 3D mode (gsp mode == 2), and enables the parallax barrier. It's unknown how to control this, but some other PDC registers control if interlacing should be done by true interleaving (both framebuffers are treated as 240x400), or skipping lines (both framebuffers are treated as 240x800)

If only scan doubling is enabled, gsp detects it as a request to switch back to 2D mode for the top screen (gsp mode == 0). This is also the default mode for the top screen.

Both interlacing and scan doubling can't be enabled in usermode, but it works as expected in baremetal.

Framebuffer color formats

Value Description
0 GL_RGBA8_OES
1 GL_RGB8_OES
2 GL_RGB565_OES
3 GL_RGB5_A1_OES
4 GL_RGBA4_OES

Color components are laid out in reverse byte order, with the most significant bits used first (i.e. non-24-bit pixels are stored as a little-endian values). For instance, a raw data stream of two GL_RGB565_OES pixels looks like GGGBBBBB RRRRRGGG GGGBBBBB RRRRRGGG.

Color formats 5, 6, and 7 are blocked by gsp, but they behave as pixel-doubled RGBA8 (not line doubling, but instead the same pixel is output twice) if used outside of userland.

Transfer Engine

Register address Description
0x1EF00C00 Input physical address >> 3
0x1EF00C04 Output physical address >> 3
0x1EF00C08 DisplayTransfer output width (bits 0-15) and height (bits 16-31).
0x1EF00C0C DisplayTransfer input width and height.
0x1EF00C10 Transfer flags. (See below)
0x1EF00C14 GSP module writes value 0 here prior to writing to 0x1EF00C18, for cmd3.
0x1EF00C18 Setting bit0 starts the transfer. Upon completion, bit0 is unset and bit8 is set.
0x1EF00C1C ?
0x1EF00C20 TextureCopy total amount of data to copy, in bytes.
0x1EF00C24 TextureCopy input line width (bits 0-15) and gap (bits 16-31), in 16 byte units.
0x1EF00C28 TextureCopy output line width and gap.

These registers are used by GX command 3 and 4. For cmd4, *0x1EF00C18 |= 1 is used instead of just writing value 1. The DisplayTransfer registers are only used if bit 3 of the flags is unset and ignored otherwise. The TextureCopy registers are likewise only used if bit 3 is set, and ignored otherwise.

Flags Register - 0x1EF00C10

Bit Description
0 When set, the framebuffer data is flipped vertically.
1 When set, the input framebuffer is treated as linear and converted to tiled in the output, converts tiled->linear when unset.
2 This bit is required when the output width is less than the input width for the hardware to properly crop the lines, otherwise the output will be mis-aligned.
3 Uses a TextureCopy mode transfer. See below for details.
4 Not writable
5 Don't perform tiled-linear conversion. Incompatible with bit 1, so only tiled-tiled transfers can be done, not linear-linear.
7-6 Not writable
10-8 Input framebuffer color format, value0 and value1 are the same as the LCD Source Framebuffer Formats (usually zero)
11 Not writable
14-12 Output framebuffer color format
15 Not writable
16 Use 32x32 block tiling mode, instead of the usual 8x8 one. Output dimensions must be multiples of 32, even if cropping with bit 2 set above.
17-23 Not writable
24-25 Scale down the input image using a box filter. 0 = No downscale, 1 = 2x1 downscale. 2 = 2x2 downscale, 3 = invalid
31-26 Not writable

TextureCopy

When bit 3 of the control register is set, the hardware performs a TextureCopy-mode transfer. In this mode, all other bits of the control register (except for bit 2, which still needs to be set correctly) and the regular dimension registers are ignored, and no format conversions are done. Instead, it performs a raw data copy from the source to the destination, but with a configurable gap between lines. The total amount of bytes to copy is specified in the size register, and the hardware loops reading lines from the input and writing them to the output until this amount is copied. The "gap" specified in the input/output dimension register is the number of chunks to skip after each "width" chunks of the input/output, and is NOT counted towards the total size of the transfer.

By correctly calculating the input and output gap sizes it is possible to use this functionality to copy arbitrary sub-rectangles between differently-sized framebuffers or textures, which is one of its main uses over a regular no-conversion DisplayTransfer. When copying tiled textures/framebuffers it's important to remember that the contents of a tile are laid out sequentially in memory, and so this should be taken into account when calculating the transfer parameters.

Specifying invalid/junk values for the TextureCopy dimensions can result in the GPU hanging while attempting to process this TextureCopy.

Command List

Register address Description
0x1EF018E0 Buffer size in bytes >> 3
0x1EF018E8 Buffer physical address >> 3
0x1EF018F0 Setting bit0 to 1 enables processing GPU command execution. Upon completion, bit0 seems to be reset to 0.

These 3 registers are used by GX command 1. This is used for GPU commands.

Framebuffers

These LCD framebuffers normally contain the last rendered frames from the GPU. The framebuffers are drawn from left-to-right, instead of top-to-bottom.(Thus the beginning of the framebuffer is drawn starting at the left side of the screen)

Both of the 3D screen left/right framebuffers are displayed regardless of the 3D slider's state, however when the 3D slider is set to "off" the 3D effect is disabled. Normally when the 3D slider's state is set to "off" the left/right framebuffer addresses are set to the same physical address. When the 3D effect is disabled and the left/right framebuffers are set to separate addresses, the LCD seems to alternate between displaying the left/right framebuffer each frame.

Init Values from nngxInitialize for Top Screen

  • 0x1EF00400 = 0x1C2
  • 0x1EF00404 = 0xD1
  • 0x1EF00408 = 0x1C1
  • 0x1EF0040C = 0x1C1
  • 0x1EF00410 = 0
  • 0x1EF00414 = 0xCF
  • 0x1EF00418 = 0xD1
  • 0x1EF0041C = 0x1C501C1
  • 0x1EF00420 = 0x10000
  • 0x1EF00424 = 0x19D
  • 0x1EF00428 = 2
  • 0x1EF0042C = 0x1C2
  • 0x1EF00430 = 0x1C2
  • 0x1EF00434 = 0x1C2
  • 0x1EF00438 = 1
  • 0x1EF0043C = 2
  • 0x1EF00440 = 0x1960192
  • 0x1EF00444 = 0
  • 0x1EF00448 = 0
  • 0x1EF0045C = 0x19000F0
  • 0x1EF00460 = 0x1c100d1
  • 0x1EF00464 = 0x1920002
  • 0x1EF00470 = 0x80340
  • 0x1EF0049C = 0

More Init Values from nngxInitialize for Top Screen

  • 0x1EF00468 = 0x18300000, later changed by GSP module when updating state, framebuffer
  • 0x1EF0046C = 0x18300000, later changed by GSP module when updating state, framebuffer
  • 0x1EF00494 = 0x18300000
  • 0x1EF00498 = 0x18300000
  • 0x1EF00478 = 1, doesn't stay 1, read as 0
  • 0x1EF00474 = 0x10501