1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078 |
- =====================================
- Syntax of AMDGPU Instruction Operands
- =====================================
- .. contents::
- :local:
- Conventions
- ===========
- The following notation is used throughout this document:
- =================== =============================================================================
- Notation Description
- =================== =============================================================================
- {0..N} Any integer value in the range from 0 to N (inclusive).
- <x> Syntax and meaning of *x* is explained elsewhere.
- =================== =============================================================================
- .. _amdgpu_syn_operands:
- Operands
- ========
- .. _amdgpu_synid_v:
- v
- -
- Vector registers. There are 256 32-bit vector registers.
- A sequence of *vector* registers may be used to operate with more than 32 bits of data.
- Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
- =================================================== ====================================================================
- Syntax Description
- =================================================== ====================================================================
- **v**\<N> A single 32-bit *vector* register.
- *N* must be a decimal
- :ref:`integer number<amdgpu_synid_integer_number>`.
- **v[**\ <N>\ **]** A single 32-bit *vector* register.
- *N* may be specified as an
- :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
- *N* and *K* may be specified as
- :ref:`integer numbers<amdgpu_synid_integer_number>`
- or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
- Register indices must be specified as decimal
- :ref:`integer numbers<amdgpu_synid_integer_number>`.
- =================================================== ====================================================================
- Note: *N* and *K* must satisfy the following conditions:
- * *N* <= *K*.
- * 0 <= *N* <= 255.
- * 0 <= *K* <= 255.
- * *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
- Examples:
- .. parsed-literal::
- v255
- v[0]
- v[0:1]
- v[1:1]
- v[0:3]
- v[2*2]
- v[1-1:2-1]
- [v252]
- [v252,v253,v254,v255]
- .. _amdgpu_synid_nsa:
- GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
- ===================================== =================================================
- Syntax Description
- ===================================== =================================================
- **[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers.
- Each register may be specified using a syntax
- defined :ref:`above<amdgpu_synid_v>`.
- In contrast with standard syntax, registers
- in *NSA* sequence are not required to have
- consecutive indices. Moreover, the same register
- may appear in the list more than once.
- ===================================== =================================================
- Examples:
- .. parsed-literal::
- [v32,v1,v[2]]
- [v[32],v[1:1],[v2]]
- [v4,v4,v4,v4]
- .. _amdgpu_synid_s:
- s
- -
- Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
- ======= ============================
- GPU Number of *scalar* registers
- ======= ============================
- GFX7 104
- GFX8 102
- GFX9 102
- GFX10 106
- ======= ============================
- A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
- Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
- Pairs of *scalar* registers must be even-aligned (the first register must be even).
- Sequences of 4 and more *scalar* registers must be quad-aligned.
- ======================================================== ====================================================================
- Syntax Description
- ======================================================== ====================================================================
- **s**\ <N> A single 32-bit *scalar* register.
- *N* must be a decimal
- :ref:`integer number<amdgpu_synid_integer_number>`.
- **s[**\ <N>\ **]** A single 32-bit *scalar* register.
- *N* may be specified as an
- :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
- *N* and *K* may be specified as
- :ref:`integer numbers<amdgpu_synid_integer_number>`
- or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
- Register indices must be specified as decimal
- :ref:`integer numbers<amdgpu_synid_integer_number>`.
- ======================================================== ====================================================================
- Note: *N* and *K* must satisfy the following conditions:
- * *N* must be properly aligned based on sequence size.
- * *N* <= *K*.
- * 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
- * 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
- * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
- Examples:
- .. parsed-literal::
- s0
- s[0]
- s[0:1]
- s[1:1]
- s[0:3]
- s[2*2]
- s[1-1:2-1]
- [s4]
- [s4,s5,s6,s7]
- Examples of *scalar* registers with an invalid alignment:
- .. parsed-literal::
- s[1:2]
- s[2:5]
- .. _amdgpu_synid_trap:
- trap
- ----
- A set of trap handler registers:
- * :ref:`ttmp<amdgpu_synid_ttmp>`
- * :ref:`tba<amdgpu_synid_tba>`
- * :ref:`tma<amdgpu_synid_tma>`
- .. _amdgpu_synid_ttmp:
- ttmp
- ----
- Trap handler temporary scalar registers, 32-bits wide.
- The number of available *ttmp* registers depends on GPU:
- ======= ===========================
- GPU Number of *ttmp* registers
- ======= ===========================
- GFX7 12
- GFX8 12
- GFX9 16
- GFX10 16
- ======= ===========================
- A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
- Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
- Pairs of *ttmp* registers must be even-aligned (the first register must be even).
- Sequences of 4 and more *ttmp* registers must be quad-aligned.
- ============================================================= ====================================================================
- Syntax Description
- ============================================================= ====================================================================
- **ttmp**\ <N> A single 32-bit *ttmp* register.
- *N* must be a decimal
- :ref:`integer number<amdgpu_synid_integer_number>`.
- **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register.
- *N* may be specified as an
- :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
- *N* and *K* may be specified as
- :ref:`integer numbers<amdgpu_synid_integer_number>`
- or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
- Register indices must be specified as decimal
- :ref:`integer numbers<amdgpu_synid_integer_number>`.
- ============================================================= ====================================================================
- Note: *N* and *K* must satisfy the following conditions:
- * *N* must be properly aligned based on sequence size.
- * *N* <= *K*.
- * 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
- * 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
- * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
- Examples:
- .. parsed-literal::
- ttmp0
- ttmp[0]
- ttmp[0:1]
- ttmp[1:1]
- ttmp[0:3]
- ttmp[2*2]
- ttmp[1-1:2-1]
- [ttmp4]
- [ttmp4,ttmp5,ttmp6,ttmp7]
- Examples of *ttmp* registers with an invalid alignment:
- .. parsed-literal::
- ttmp[1:2]
- ttmp[2:5]
- .. _amdgpu_synid_tba:
- tba
- ---
- Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
- ================== ======================================================================= =============
- Syntax Description Availability
- ================== ======================================================================= =============
- tba 64-bit *trap base address* register. GFX7, GFX8
- [tba] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
- [tba_lo,tba_hi] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
- ================== ======================================================================= =============
- High and low 32 bits of *trap base address* may be accessed as separate registers:
- ================== ======================================================================= =============
- Syntax Description Availability
- ================== ======================================================================= =============
- tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
- tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
- [tba_lo] Low 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
- [tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
- ================== ======================================================================= =============
- Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
- but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
- .. _amdgpu_synid_tma:
- tma
- ---
- Trap memory address, 64-bits wide.
- ================= ======================================================================= ==================
- Syntax Description Availability
- ================= ======================================================================= ==================
- tma 64-bit *trap memory address* register. GFX7, GFX8
- [tma] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
- [tma_lo,tma_hi] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
- ================= ======================================================================= ==================
- High and low 32 bits of *trap memory address* may be accessed as separate registers:
- ================= ======================================================================= ==================
- Syntax Description Availability
- ================= ======================================================================= ==================
- tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
- tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
- [tma_lo] Low 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
- [tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
- ================= ======================================================================= ==================
- Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
- but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
- .. _amdgpu_synid_flat_scratch:
- flat_scratch
- ------------
- Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
- ================================== ================================================================
- Syntax Description
- ================================== ================================================================
- flat_scratch 64-bit *flat scratch* address register.
- [flat_scratch] 64-bit *flat scratch* address register (an SP3 syntax).
- [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an SP3 syntax).
- ================================== ================================================================
- High and low 32 bits of *flat scratch* address may be accessed as separate registers:
- ========================= =========================================================================
- Syntax Description
- ========================= =========================================================================
- flat_scratch_lo Low 32 bits of *flat scratch* address register.
- flat_scratch_hi High 32 bits of *flat scratch* address register.
- [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an SP3 syntax).
- [flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax).
- ========================= =========================================================================
- .. _amdgpu_synid_xnack:
- xnack
- -----
- Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
- received an *XNACK* due to a vector memory operation.
- .. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`.
- \
- ============================== =====================================================
- Syntax Description
- ============================== =====================================================
- xnack_mask 64-bit *xnack mask* register.
- [xnack_mask] 64-bit *xnack mask* register (an SP3 syntax).
- [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an SP3 syntax).
- ============================== =====================================================
- High and low 32 bits of *xnack mask* may be accessed as separate registers:
- ===================== ==============================================================
- Syntax Description
- ===================== ==============================================================
- xnack_mask_lo Low 32 bits of *xnack mask* register.
- xnack_mask_hi High 32 bits of *xnack mask* register.
- [xnack_mask_lo] Low 32 bits of *xnack mask* register (an SP3 syntax).
- [xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax).
- ===================== ==============================================================
- .. _amdgpu_synid_vcc:
- .. _amdgpu_synid_vcc_lo:
- vcc
- ---
- Vector condition code, 64-bits wide. A bit mask with one bit per thread;
- it holds the result of a vector compare operation.
- Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
- ================ =========================================================================
- Syntax Description
- ================ =========================================================================
- vcc 64-bit *vector condition code* register.
- [vcc] 64-bit *vector condition code* register (an SP3 syntax).
- [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an SP3 syntax).
- ================ =========================================================================
- High and low 32 bits of *vector condition code* may be accessed as separate registers:
- ================ =========================================================================
- Syntax Description
- ================ =========================================================================
- vcc_lo Low 32 bits of *vector condition code* register.
- vcc_hi High 32 bits of *vector condition code* register.
- [vcc_lo] Low 32 bits of *vector condition code* register (an SP3 syntax).
- [vcc_hi] High 32 bits of *vector condition code* register (an SP3 syntax).
- ================ =========================================================================
- .. _amdgpu_synid_m0:
- m0
- --
- A 32-bit memory register. It has various uses,
- including register indexing and bounds checking.
- =========== ===================================================
- Syntax Description
- =========== ===================================================
- m0 A 32-bit *memory* register.
- [m0] A 32-bit *memory* register (an SP3 syntax).
- =========== ===================================================
- .. _amdgpu_synid_exec:
- exec
- ----
- Execute mask, 64-bits wide. A bit mask with one bit per thread,
- which is applied to vector instructions and controls which threads execute
- and which ignore the instruction.
- Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
- ===================== =================================================================
- Syntax Description
- ===================== =================================================================
- exec 64-bit *execute mask* register.
- [exec] 64-bit *execute mask* register (an SP3 syntax).
- [exec_lo,exec_hi] 64-bit *execute mask* register (an SP3 syntax).
- ===================== =================================================================
- High and low 32 bits of *execute mask* may be accessed as separate registers:
- ===================== =================================================================
- Syntax Description
- ===================== =================================================================
- exec_lo Low 32 bits of *execute mask* register.
- exec_hi High 32 bits of *execute mask* register.
- [exec_lo] Low 32 bits of *execute mask* register (an SP3 syntax).
- [exec_hi] High 32 bits of *execute mask* register (an SP3 syntax).
- ===================== =================================================================
- .. _amdgpu_synid_vccz:
- vccz
- ----
- A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
- Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
- .. _amdgpu_synid_execz:
- execz
- -----
- A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
- Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
- .. _amdgpu_synid_scc:
- scc
- ---
- A single bit flag indicating the result of a scalar compare operation.
- .. _amdgpu_synid_lds_direct:
- lds_direct
- ----------
- A special operand which supplies a 32-bit value
- fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
- .. _amdgpu_synid_null:
- null
- ----
- This is a special operand which may be used as a source or a destination.
- When used as a destination, the result of the operation is discarded.
- When used as a source, it supplies zero value.
- GFX10 only.
- .. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10.
- .. _amdgpu_synid_constant:
- inline constant
- ---------------
- An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
- Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
- Inline constants include:
- * :ref:`iconst<amdgpu_synid_iconst>`
- * :ref:`fconst<amdgpu_synid_fconst>`
- * :ref:`ival<amdgpu_synid_ival>`
- If a number may be encoded as either
- a :ref:`literal<amdgpu_synid_literal>` or
- a :ref:`constant<amdgpu_synid_constant>`,
- assembler selects the latter encoding as more efficient.
- .. _amdgpu_synid_iconst:
- iconst
- ~~~~~~
- An :ref:`integer number<amdgpu_synid_integer_number>` or
- an :ref:`absolute expression<amdgpu_synid_absolute_expression>`
- encoded as an *inline constant*.
- Only a small fraction of integer numbers may be encoded as *inline constants*.
- They are enumerated in the table below.
- Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
- ================================== ====================================
- Value Note
- ================================== ====================================
- {0..64} Positive integer inline constants.
- {-16..-1} Negative integer inline constants.
- ================================== ====================================
- .. WARNING:: GFX7 does not support inline constants for *f16* operands.
- .. _amdgpu_synid_fconst:
- fconst
- ~~~~~~
- A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
- encoded as an *inline constant*.
- Only a small fraction of floating-point numbers may be encoded as *inline constants*.
- They are enumerated in the table below.
- Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
- ===================== ===================================================== ==================
- Value Note Availability
- ===================== ===================================================== ==================
- 0.0 The same as integer constant 0. All GPUs
- 0.5 Floating-point constant 0.5 All GPUs
- 1.0 Floating-point constant 1.0 All GPUs
- 2.0 Floating-point constant 2.0 All GPUs
- 4.0 Floating-point constant 4.0 All GPUs
- -0.5 Floating-point constant -0.5 All GPUs
- -1.0 Floating-point constant -1.0 All GPUs
- -2.0 Floating-point constant -2.0 All GPUs
- -4.0 Floating-point constant -4.0 All GPUs
- 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9, GFX10
- 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9, GFX10
- 0.15915494309189532 1.0/(2.0*pi). GFX8, GFX9, GFX10
- ===================== ===================================================== ==================
- .. WARNING:: GFX7 does not support inline constants for *f16* operands.
- .. _amdgpu_synid_ival:
- ival
- ~~~~
- A symbolic operand encoded as an *inline constant*.
- These operands provide read-only access to H/W registers.
- ======================== ================================================ =============
- Syntax Note Availability
- ======================== ================================================ =============
- shared_base Base address of shared memory region. GFX9, GFX10
- shared_limit Address of the end of shared memory region. GFX9, GFX10
- private_base Base address of private memory region. GFX9, GFX10
- private_limit Address of the end of private memory region. GFX9, GFX10
- pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10
- ======================== ================================================ =============
- .. _amdgpu_synid_literal:
- literal
- -------
- A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
- Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
- If a number may be encoded as either
- a :ref:`literal<amdgpu_synid_literal>` or
- an :ref:`inline constant<amdgpu_synid_constant>`,
- assembler selects the latter encoding as more efficient.
- Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
- :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
- :ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
- An instruction may use only one literal but several operands may refer the same literal.
- .. _amdgpu_synid_uimm8:
- uimm8
- -----
- A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- The value must be in the range 0..0xFF.
- .. _amdgpu_synid_uimm32:
- uimm32
- ------
- A 32-bit :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- The value must be in the range 0..0xFFFFFFFF.
- .. _amdgpu_synid_uimm20:
- uimm20
- ------
- A 20-bit :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- The value must be in the range 0..0xFFFFF.
- .. _amdgpu_synid_uimm21:
- uimm21
- ------
- A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- The value must be in the range 0..0x1FFFFF.
- .. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
- .. _amdgpu_synid_simm21:
- simm21
- ------
- A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- The value must be in the range -0x100000..0x0FFFFF.
- .. WARNING:: Assembler currently supports 20-bit unsigned offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
- .. _amdgpu_synid_off:
- off
- ---
- A special entity which indicates that the value of this operand is not used.
- ================================== ===================================================
- Syntax Description
- ================================== ===================================================
- off Indicates an unused operand.
- ================================== ===================================================
- .. _amdgpu_synid_number:
- Numbers
- =======
- .. _amdgpu_synid_integer_number:
- Integer Numbers
- ---------------
- Integer numbers are 64 bits wide.
- They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
- as described :ref:`here<amdgpu_synid_int_conv>`.
- Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
- ============ =============================== ========
- Format Syntax Example
- ============ =============================== ========
- Decimal [-]?[1-9][0-9]* -1234
- Binary [-]?0b[01]+ 0b1010
- Octal [-]?0[0-7]+ 010
- Hexadecimal [-]?0x[0-9a-fA-F]+ 0xff
- \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 0ffh
- ============ =============================== ========
- .. _amdgpu_synid_floating-point_number:
- Floating-Point Numbers
- ----------------------
- All floating-point numbers are handled as double (64 bits wide).
- They are converted to
- :ref:`expected operand type<amdgpu_syn_instruction_type>`
- as described :ref:`here<amdgpu_synid_fp_conv>`.
- Floating-point numbers may be specified in hexadecimal and decimal formats:
- ============ ======================================================== ====================== ====================
- Format Syntax Examples Note
- ============ ======================================================== ====================== ====================
- Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? -1.234, 234e2 Must include either
- a decimal separator
- or an exponent.
- Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ -0x1afp-10, 0x.1afp10
- ============ ======================================================== ====================== ====================
- .. _amdgpu_synid_expression:
- Expressions
- ===========
- An expression is evaluated to a 64-bit integer.
- Note that floating-point expressions are not supported.
- There are two kinds of expressions:
- * :ref:`Absolute<amdgpu_synid_absolute_expression>`.
- * :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
- .. _amdgpu_synid_absolute_expression:
- Absolute Expressions
- --------------------
- The value of an absolute expression does not change after program relocation.
- Absolute expressions must not include unassigned and relocatable values
- such as labels.
- Absolute expressions are evaluated to 64-bit integer values and converted to
- :ref:`expected operand type<amdgpu_syn_instruction_type>`
- as described :ref:`here<amdgpu_synid_int_conv>`.
- Examples:
- .. parsed-literal::
- x = -1
- y = x + 10
- .. _amdgpu_synid_relocatable_expression:
- Relocatable Expressions
- -----------------------
- The value of a relocatable expression depends on program relocation.
- Note that use of relocatable expressions is limited with branch targets
- and 32-bit integer operands.
- A relocatable expression is evaluated to a 64-bit integer value
- which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
- of symbol(s) used in the expression. For example, if an instruction refers a label,
- this reference is evaluated to an offset from the address after the instruction
- to the label address:
- .. parsed-literal::
- label:
- v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4
- Note that values of relocatable expressions are usually unknown at assembly time;
- they are resolved later by a linker and converted to
- :ref:`expected operand type<amdgpu_syn_instruction_type>`
- as described :ref:`here<amdgpu_synid_rl_conv>`.
- Operands and Operations
- -----------------------
- Expressions are composed of 64-bit integer operands and operations.
- Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
- and :ref:`symbols<amdgpu_synid_symbol>`.
- Expressions may also use "." which is a reference to the current PC (program counter).
- :ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
- operations produce 64-bit integer results.
- Syntax of Expressions
- ---------------------
- The syntax of expressions is shown below::
- expr ::= expr binop expr | primaryexpr ;
- primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
- binop ::= '&&'
- | '||'
- | '|'
- | '^'
- | '&'
- | '!'
- | '=='
- | '!='
- | '<>'
- | '<'
- | '<='
- | '>'
- | '>='
- | '<<'
- | '>>'
- | '+'
- | '-'
- | '*'
- | '/'
- | '%' ;
- unop ::= '~'
- | '+'
- | '-'
- | '!' ;
- .. _amdgpu_synid_expression_bin_op:
- Binary Operators
- ----------------
- Binary operators are described in the following table.
- They operate on and produce 64-bit integers.
- Operators with higher priority are performed first.
- ========== ========= ===============================================
- Operator Priority Meaning
- ========== ========= ===============================================
- \* 5 Integer multiplication.
- / 5 Integer division.
- % 5 Integer signed remainder.
- \+ 4 Integer addition.
- \- 4 Integer subtraction.
- << 3 Integer shift left.
- >> 3 Logical shift right.
- == 2 Equality comparison.
- != 2 Inequality comparison.
- <> 2 Inequality comparison.
- < 2 Signed less than comparison.
- <= 2 Signed less than or equal comparison.
- > 2 Signed greater than comparison.
- >= 2 Signed greater than or equal comparison.
- \| 1 Bitwise or.
- ^ 1 Bitwise xor.
- & 1 Bitwise and.
- && 0 Logical and.
- || 0 Logical or.
- ========== ========= ===============================================
- .. _amdgpu_synid_expression_un_op:
- Unary Operators
- ---------------
- Unary operators are described in the following table.
- They operate on and produce 64-bit integers.
- ========== ===============================================
- Operator Meaning
- ========== ===============================================
- ! Logical negation.
- ~ Bitwise negation.
- \+ Integer unary plus.
- \- Integer unary minus.
- ========== ===============================================
- .. _amdgpu_synid_symbol:
- Symbols
- -------
- A symbol is a named 64-bit integer value, representing a relocatable
- address or an absolute (non-relocatable) number.
- Symbol names have the following syntax:
- ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
- The table below provides several examples of syntax used for symbol definition.
- ================ ==========================================================
- Syntax Meaning
- ================ ==========================================================
- .globl <S> Declares a global symbol S without assigning it a value.
- .set <S>, <E> Assigns the value of an expression E to a symbol S.
- <S> = <E> Assigns the value of an expression E to a symbol S.
- <S>: Declares a label S and assigns it the current PC value.
- ================ ==========================================================
- A symbol may be used before it is declared or assigned;
- unassigned symbols are assumed to be PC-relative.
- Additional information about symbols may be found :ref:`here<amdgpu-symbols>`.
- .. _amdgpu_synid_conv:
- Type and Size Conversion
- ========================
- This section describes what happens when a 64-bit
- :ref:`integer number<amdgpu_synid_integer_number>`, a
- :ref:`floating-point number<amdgpu_synid_floating-point_number>` or an
- :ref:`expression<amdgpu_synid_expression>`
- is used for an operand which has a different type or size.
- .. _amdgpu_synid_int_conv:
- Conversion of Integer Values
- ----------------------------
- Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
- the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
- 1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
- (see the table below). There are two cases when this operation is enabled:
- * The truncated bits are all 0.
- * The truncated bits are all 1 and the value after truncation has its MSB bit set.
- In all other cases assembler triggers an error.
- 2. *Conversion*. The input value is converted to the expected type as described in the table below.
- Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
- ============== ================= =============== ====================================================================
- Expected type Truncation Width Conversion Description
- ============== ================= =============== ====================================================================
- i16, u16, b16 16 num.u16 Truncate to 16 bits.
- i32, u32, b32 32 num.u32 Truncate to 32 bits.
- i64 32 {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
- u64, b64 32 {0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
- f16 16 num.u16 Use low 16 bits as an f16 value.
- f32 32 num.u32 Use low 32 bits as an f32 value.
- f64 32 {num.u32,0} Use low 32 bits of the number as high 32 bits
- of the result; low 32 bits of the result are zeroed.
- ============== ================= =============== ====================================================================
- Examples of enabled conversions:
- .. parsed-literal::
- // GFX9
- v_add_u16 v0, -1, 0 // src0 = 0xFFFF
- v_add_f16 v0, -1, 0 // src0 = 0xFFFF (NaN)
- //
- v_add_u32 v0, -1, 0 // src0 = 0xFFFFFFFF
- v_add_f32 v0, -1, 0 // src0 = 0xFFFFFFFF (NaN)
- //
- v_add_u16 v0, 0xff00, v0 // src0 = 0xff00
- v_add_u16 v0, 0xffffffffffffff00, v0 // src0 = 0xff00
- v_add_u16 v0, -256, v0 // src0 = 0xff00
- //
- s_bfe_i64 s[0:1], 0xffefffff, s3 // src0 = 0xffffffffffefffff
- s_bfe_u64 s[0:1], 0xffefffff, s3 // src0 = 0x00000000ffefffff
- v_ceil_f64_e32 v[0:1], 0xffefffff // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
- //
- x = 0xffefffff //
- s_bfe_i64 s[0:1], x, s3 // src0 = 0xffffffffffefffff
- s_bfe_u64 s[0:1], x, s3 // src0 = 0x00000000ffefffff
- v_ceil_f64_e32 v[0:1], x // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
- Examples of disabled conversions:
- .. parsed-literal::
- // GFX9
- v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1
- v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result
- .. _amdgpu_synid_fp_conv:
- Conversion of Floating-Point Values
- -----------------------------------
- Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
- These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
- 1. *Validation*. Assembler checks if the input f64 number can be converted
- to the *required floating-point type* (see the table below) without overflow or underflow.
- Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
- 2. *Conversion*. The input value is converted to the expected type as described in the table below.
- Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
- ============== ================ ================= =================================================================
- Expected type Required FP Type Conversion Description
- ============== ================ ================= =================================================================
- i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value.
- i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value.
- i64, u64, b64 \- \- Conversion disabled.
- f16 f16 f16(num) Convert to f16.
- f32 f32 f32(num) Convert to f32.
- f64 f64 {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
- zero-fill low 32 bits of the result.
- Note that the result may differ from the original number.
- ============== ================ ================= =================================================================
- Examples of enabled conversions:
- .. parsed-literal::
- // GFX9
- v_add_f16 v0, 1.0, 0 // src0 = 0x3C00 (1.0)
- v_add_u16 v0, 1.0, 0 // src0 = 0x3C00
- //
- v_add_f32 v0, 1.0, 0 // src0 = 0x3F800000 (1.0)
- v_add_u32 v0, 1.0, 0 // src0 = 0x3F800000
- // src0 before conversion:
- // 1.7976931348623157e308 = 0x7fefffffffffffff
- // src0 after conversion:
- // 1.7976922776554302e308 = 0x7fefffff00000000
- v_ceil_f64 v[0:1], 1.7976931348623157e308
- v_add_f16 v1, 65500.0, v2 // ok for f16.
- v_add_f32 v1, 65600.0, v2 // ok for f32, but would result in overflow for f16.
- Examples of disabled conversions:
- .. parsed-literal::
- // GFX9
- v_add_f16 v1, 65600.0, v2 // overflow
- .. _amdgpu_synid_rl_conv:
- Conversion of Relocatable Values
- --------------------------------
- :ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>`
- may be used with 32-bit integer operands and jump targets.
- When the value of a relocatable expression is resolved by a linker, it is
- converted as needed and truncated to the operand size. The conversion depends
- on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
- For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
- this reference is evaluated to a 64-bit offset from the address after the
- instruction to the address being referenced, *counted in bytes*.
- Then the value is truncated to 32 bits and encoded as a literal:
- .. parsed-literal::
- expr = .
- v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4
- // and then truncated to 0xFFFFFFFC
- As another example, when a branch instruction refers a label,
- this reference is evaluated to an offset from the address after the
- instruction to the label address, *counted in dwords*.
- Then the value is truncated to 16 bits:
- .. parsed-literal::
- label:
- s_branch label // 'label' operand is evaluated to -1 and truncated to 0xFFFF
|