GPU/Shader Instruction Set: Difference between revisions

Line 4:

A compiled shader binary is comprised of two parts : the main instruction sequence and the operand descriptor table. These are both sent to the GPU around the same time but using separate [[GPU Commands]]. Instructions (such as format 1 instruction) may reference operand descriptors. When such is the case, the operand descriptor ID is the offset, in words, of the descriptor within the table.

Both instructions and descriptors are coded in little endian.

Basic implementations of the following specification can be found at [https://github.com/smealum/aemstro] and [https://github.com/neobrain/nihstro]

Basic implementations of the following specification can be found at [https://github.com/smealum/aemstro] and [https://github.com/neobrain/nihstro].

The instruction set seems to have been heavily inspired by Microsoft's vs_3_0 [http://msdn.microsoft.com/en-us/library/windows/desktop/bb172938%28v=vs.85%29.aspx].

Please note that this page is being written as the instruction set is reverse engineered; as such it may very well contain mistakes.

Line 354:

Line 355:

| 0x12

| 1u

| ~~ARL~~

| MOVA

| Address Register Load; sets (a0, a1, _, _) to SRC1 (cast to integer).

| Address Register Load; sets (a0.x, a0.y, _, _) to SRC1 (cast to integer).

|-

| 0x13

Line 470:

Line 471:

| 3

| FORLOOP

| Loops over the code between itself and DST. First sets ~~lcnt~~ to INT.y, then increments ~~lcnt~~ by INT.z after each loop. Loops until ~~lcnt~~ reaches INT.y+INT.x, inclusive (that is : for(aL=INT.y;aL<=INT.y+INT.x;aL+=INT.z)). (INT is i0-i3, an integer vector uniform)

| Loops over the code between itself and DST. First sets aL to INT.y, then increments aL by INT.z after each loop. Loops until aL reaches INT.y+INT.x, inclusive (that is : for(aL=INT.y;aL<=INT.y+INT.x;aL+=INT.z)). (INT is i0-i3, an integer vector uniform)

|-

| 0x2A

Line 592:

Line 593:

== Relative addressing ==

There are 3 global address registers : a0, a1 and ~~a2 = lcnt~~ (loop counter). For format 1 instructions, when IDX != 0, the value of the corresponding address register is added to SRC1's value.

There are 3 global address registers : a0.x, a0.y and aL (loop counter). For format 1 instructions, when IDX != 0, the value of the corresponding address register is added to SRC1's value.

For example, if IDX = 2, a1 = 3 and SRC1 = c8, then instead SRC1+a1 = c11 will be used for the instruction.

For example, if IDX = 2, a0.y = 3 and SRC1 = c8, then instead SRC1+a0.y = c11 will be used for the instruction.

a0 and a1 can be set manually through the ~~ARL~~ instruction. ~~lcnt~~ is set automatically by the LOOP instruction. Note that ~~lcnt~~ is still accessible and valid after exiting a LOOP block.

a0.x and a0.y can be set manually through the MOVA instruction. aL is set automatically by the LOOP instruction. Note that aL is still accessible and valid after exiting a LOOP block.

== Comparison operator ==

@@ Line 4: / Line 4: @@
 A compiled shader binary is comprised of two parts : the main instruction sequence and the operand descriptor table. These are both sent to the GPU around the same time but using separate [[GPU Commands]]. Instructions (such as format 1 instruction) may reference operand descriptors. When such is the case, the operand descriptor ID is the offset, in words, of the descriptor within the table.
 Both instructions and descriptors are coded in little endian.
-Basic implementations of the following specification can be found at [https://github.com/smealum/aemstro] and [https://github.com/neobrain/nihstro]
+Basic implementations of the following specification can be found at [https://github.com/smealum/aemstro] and [https://github.com/neobrain/nihstro].
+The instruction set seems to have been heavily inspired by Microsoft's vs_3_0 [http://msdn.microsoft.com/en-us/library/windows/desktop/bb172938%28v=vs.85%29.aspx].
 Please note that this page is being written as the instruction set is reverse engineered; as such it may very well contain mistakes.
@@ Line 354: / Line 355: @@
 |  0x12
 |  1u
-|  ARL
+|  MOVA
-|  Address Register Load; sets (a0, a1, _, _) to SRC1 (cast to integer).
+|  Address Register Load; sets (a0.x, a0.y, _, _) to SRC1 (cast to integer).
 |-
 |  0x13
@@ Line 470: / Line 471: @@
 |  3
 |  FORLOOP
-|  Loops over the code between itself and DST. First sets lcnt to INT.y, then increments lcnt by INT.z after each loop. Loops until lcnt reaches INT.y+INT.x, inclusive (that is : for(aL=INT.y;aL<=INT.y+INT.x;aL+=INT.z)). (INT is i0-i3, an integer vector uniform)
+|  Loops over the code between itself and DST. First sets aL to INT.y, then increments aL by INT.z after each loop. Loops until aL reaches INT.y+INT.x, inclusive (that is : for(aL=INT.y;aL<=INT.y+INT.x;aL+=INT.z)). (INT is i0-i3, an integer vector uniform)
 |-
 |  0x2A
@@ Line 592: / Line 593: @@
 == Relative addressing ==
-There are 3 global address registers : a0, a1 and a2 = lcnt (loop counter). For format 1 instructions, when IDX != 0, the value of the corresponding address register is added to SRC1's value.
+There are 3 global address registers : a0.x, a0.y and aL (loop counter). For format 1 instructions, when IDX != 0, the value of the corresponding address register is added to SRC1's value.
-For example, if IDX = 2, a1 = 3 and SRC1 = c8, then instead SRC1+a1 = c11 will be used for the instruction.
+For example, if IDX = 2, a0.y = 3 and SRC1 = c8, then instead SRC1+a0.y = c11 will be used for the instruction.
-a0 and a1 can be set manually through the ARL instruction. lcnt is set automatically by the LOOP instruction. Note that lcnt is still accessible and valid after exiting a LOOP block.
+a0.x and a0.y can be set manually through the MOVA instruction. aL is set automatically by the LOOP instruction. Note that aL is still accessible and valid after exiting a LOOP block.
 == Comparison operator ==