|
@@ -38,7 +38,8 @@ Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* register
|
|
|
=================================================== ====================================================================
|
|
|
**v**\<N> A single 32-bit *vector* register.
|
|
|
|
|
|
- *N* must be a decimal integer number.
|
|
|
+ *N* must be a decimal
|
|
|
+ :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
**v[**\ <N>\ **]** A single 32-bit *vector* register.
|
|
|
|
|
|
*N* may be specified as an
|
|
@@ -51,10 +52,11 @@ Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* register
|
|
|
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
|
|
|
**[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
|
|
|
|
|
|
- Register indices must be specified as decimal integer numbers.
|
|
|
+ Register indices must be specified as decimal
|
|
|
+ :ref:`integer numbers<amdgpu_synid_integer_number>`.
|
|
|
=================================================== ====================================================================
|
|
|
|
|
|
-Note. *N* and *K* must satisfy the following conditions:
|
|
|
+Note: *N* and *K* must satisfy the following conditions:
|
|
|
|
|
|
* *N* <= *K*.
|
|
|
* 0 <= *N* <= 255.
|
|
@@ -77,26 +79,27 @@ Examples:
|
|
|
|
|
|
.. _amdgpu_synid_nsa:
|
|
|
|
|
|
-*Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
|
|
|
+GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
|
|
|
|
|
|
- =================================================== ====================================================================
|
|
|
- Syntax Description
|
|
|
- =================================================== ====================================================================
|
|
|
- **[v**\ <A>, \ **v**\ <B>, ... **v**\ <X>\ **]** A sequence of *vector* registers. At least one register
|
|
|
- must be specified.
|
|
|
+ ===================================== =================================================
|
|
|
+ Syntax Description
|
|
|
+ ===================================== =================================================
|
|
|
+ **[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers.
|
|
|
+ Each register may be specified using a syntax
|
|
|
+ defined :ref:`above<amdgpu_synid_v>`.
|
|
|
|
|
|
- In contrast with standard syntax described above, registers in
|
|
|
- this sequence are not required to have consecutive indices.
|
|
|
- Moreover, the same register may appear in the list more than once.
|
|
|
- =================================================== ====================================================================
|
|
|
-
|
|
|
-Note. Reqister indices must be in the range 0..255. They must be specified as decimal integer numbers.
|
|
|
+ In contrast with standard syntax, registers
|
|
|
+ in *NSA* sequence are not required to have
|
|
|
+ consecutive indices. Moreover, the same register
|
|
|
+ may appear in the list more than once.
|
|
|
+ ===================================== =================================================
|
|
|
|
|
|
Examples:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
- [v32,v1,v2]
|
|
|
+ [v32,v1,v[2]]
|
|
|
+ [v[32],v[1:1],[v2]]
|
|
|
[v4,v4,v4,v4]
|
|
|
|
|
|
.. _amdgpu_synid_s:
|
|
@@ -126,7 +129,9 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.
|
|
|
======================================================== ====================================================================
|
|
|
**s**\ <N> A single 32-bit *scalar* register.
|
|
|
|
|
|
- *N* must be a decimal integer number.
|
|
|
+ *N* must be a decimal
|
|
|
+ :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
+
|
|
|
**s[**\ <N>\ **]** A single 32-bit *scalar* register.
|
|
|
|
|
|
*N* may be specified as an
|
|
@@ -137,12 +142,14 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.
|
|
|
*N* and *K* may be specified as
|
|
|
:ref:`integer numbers<amdgpu_synid_integer_number>`
|
|
|
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
|
|
|
+
|
|
|
**[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
|
|
|
|
|
|
- Register indices must be specified as decimal integer numbers.
|
|
|
+ Register indices must be specified as decimal
|
|
|
+ :ref:`integer numbers<amdgpu_synid_integer_number>`.
|
|
|
======================================================== ====================================================================
|
|
|
|
|
|
-Note. *N* and *K* must satisfy the following conditions:
|
|
|
+Note: *N* and *K* must satisfy the following conditions:
|
|
|
|
|
|
* *N* must be properly aligned based on sequence size.
|
|
|
* *N* <= *K*.
|
|
@@ -210,7 +217,8 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.
|
|
|
============================================================= ====================================================================
|
|
|
**ttmp**\ <N> A single 32-bit *ttmp* register.
|
|
|
|
|
|
- *N* must be a decimal integer number.
|
|
|
+ *N* must be a decimal
|
|
|
+ :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
**ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register.
|
|
|
|
|
|
*N* may be specified as an
|
|
@@ -223,10 +231,11 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.
|
|
|
or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
|
|
|
**[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
|
|
|
|
|
|
- Register indices must be specified as decimal integer numbers.
|
|
|
+ Register indices must be specified as decimal
|
|
|
+ :ref:`integer numbers<amdgpu_synid_integer_number>`.
|
|
|
============================================================= ====================================================================
|
|
|
|
|
|
-Note. *N* and *K* must satisfy the following conditions:
|
|
|
+Note: *N* and *K* must satisfy the following conditions:
|
|
|
|
|
|
* *N* must be properly aligned based on sequence size.
|
|
|
* *N* <= *K*.
|
|
@@ -266,8 +275,8 @@ Trap base address, 64-bits wide. Holds the pointer to the current trap handler p
|
|
|
Syntax Description Availability
|
|
|
================== ======================================================================= =============
|
|
|
tba 64-bit *trap base address* register. GFX7, GFX8
|
|
|
- [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
|
|
|
- [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
|
|
|
+ [tba] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
|
|
|
+ [tba_lo,tba_hi] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8
|
|
|
================== ======================================================================= =============
|
|
|
|
|
|
High and low 32 bits of *trap base address* may be accessed as separate registers:
|
|
@@ -277,8 +286,8 @@ High and low 32 bits of *trap base address* may be accessed as separate register
|
|
|
================== ======================================================================= =============
|
|
|
tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
|
|
|
tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
|
|
|
- [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
|
|
|
- [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
|
|
|
+ [tba_lo] Low 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
|
|
|
+ [tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
|
|
|
================== ======================================================================= =============
|
|
|
|
|
|
Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
|
|
@@ -295,8 +304,8 @@ Trap memory address, 64-bits wide.
|
|
|
Syntax Description Availability
|
|
|
================= ======================================================================= ==================
|
|
|
tma 64-bit *trap memory address* register. GFX7, GFX8
|
|
|
- [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
|
|
|
- [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
|
|
|
+ [tma] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
|
|
|
+ [tma_lo,tma_hi] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8
|
|
|
================= ======================================================================= ==================
|
|
|
|
|
|
High and low 32 bits of *trap memory address* may be accessed as separate registers:
|
|
@@ -306,8 +315,8 @@ High and low 32 bits of *trap memory address* may be accessed as separate regist
|
|
|
================= ======================================================================= ==================
|
|
|
tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
|
|
|
tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
|
|
|
- [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
|
|
|
- [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
|
|
|
+ [tma_lo] Low 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
|
|
|
+ [tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
|
|
|
================= ======================================================================= ==================
|
|
|
|
|
|
Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
|
|
@@ -324,8 +333,8 @@ Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
|
|
|
Syntax Description
|
|
|
================================== ================================================================
|
|
|
flat_scratch 64-bit *flat scratch* address register.
|
|
|
- [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax).
|
|
|
- [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax).
|
|
|
+ [flat_scratch] 64-bit *flat scratch* address register (an SP3 syntax).
|
|
|
+ [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an SP3 syntax).
|
|
|
================================== ================================================================
|
|
|
|
|
|
High and low 32 bits of *flat scratch* address may be accessed as separate registers:
|
|
@@ -335,8 +344,8 @@ High and low 32 bits of *flat scratch* address may be accessed as separate regis
|
|
|
========================= =========================================================================
|
|
|
flat_scratch_lo Low 32 bits of *flat scratch* address register.
|
|
|
flat_scratch_hi High 32 bits of *flat scratch* address register.
|
|
|
- [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax).
|
|
|
- [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax).
|
|
|
+ [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an SP3 syntax).
|
|
|
+ [flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax).
|
|
|
========================= =========================================================================
|
|
|
|
|
|
.. _amdgpu_synid_xnack:
|
|
@@ -355,8 +364,8 @@ received an *XNACK* due to a vector memory operation.
|
|
|
Syntax Description
|
|
|
============================== =====================================================
|
|
|
xnack_mask 64-bit *xnack mask* register.
|
|
|
- [xnack_mask] 64-bit *xnack mask* register (an alternative syntax).
|
|
|
- [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax).
|
|
|
+ [xnack_mask] 64-bit *xnack mask* register (an SP3 syntax).
|
|
|
+ [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an SP3 syntax).
|
|
|
============================== =====================================================
|
|
|
|
|
|
High and low 32 bits of *xnack mask* may be accessed as separate registers:
|
|
@@ -366,8 +375,8 @@ High and low 32 bits of *xnack mask* may be accessed as separate registers:
|
|
|
===================== ==============================================================
|
|
|
xnack_mask_lo Low 32 bits of *xnack mask* register.
|
|
|
xnack_mask_hi High 32 bits of *xnack mask* register.
|
|
|
- [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax).
|
|
|
- [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax).
|
|
|
+ [xnack_mask_lo] Low 32 bits of *xnack mask* register (an SP3 syntax).
|
|
|
+ [xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax).
|
|
|
===================== ==============================================================
|
|
|
|
|
|
.. _amdgpu_synid_vcc:
|
|
@@ -385,8 +394,8 @@ Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
|
|
|
Syntax Description
|
|
|
================ =========================================================================
|
|
|
vcc 64-bit *vector condition code* register.
|
|
|
- [vcc] 64-bit *vector condition code* register (an alternative syntax).
|
|
|
- [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax).
|
|
|
+ [vcc] 64-bit *vector condition code* register (an SP3 syntax).
|
|
|
+ [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an SP3 syntax).
|
|
|
================ =========================================================================
|
|
|
|
|
|
High and low 32 bits of *vector condition code* may be accessed as separate registers:
|
|
@@ -396,8 +405,8 @@ High and low 32 bits of *vector condition code* may be accessed as separate regi
|
|
|
================ =========================================================================
|
|
|
vcc_lo Low 32 bits of *vector condition code* register.
|
|
|
vcc_hi High 32 bits of *vector condition code* register.
|
|
|
- [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax).
|
|
|
- [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax).
|
|
|
+ [vcc_lo] Low 32 bits of *vector condition code* register (an SP3 syntax).
|
|
|
+ [vcc_hi] High 32 bits of *vector condition code* register (an SP3 syntax).
|
|
|
================ =========================================================================
|
|
|
|
|
|
.. _amdgpu_synid_m0:
|
|
@@ -412,7 +421,7 @@ including register indexing and bounds checking.
|
|
|
Syntax Description
|
|
|
=========== ===================================================
|
|
|
m0 A 32-bit *memory* register.
|
|
|
- [m0] A 32-bit *memory* register (an alternative syntax).
|
|
|
+ [m0] A 32-bit *memory* register (an SP3 syntax).
|
|
|
=========== ===================================================
|
|
|
|
|
|
.. _amdgpu_synid_exec:
|
|
@@ -430,8 +439,8 @@ Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
|
|
|
Syntax Description
|
|
|
===================== =================================================================
|
|
|
exec 64-bit *execute mask* register.
|
|
|
- [exec] 64-bit *execute mask* register (an alternative syntax).
|
|
|
- [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax).
|
|
|
+ [exec] 64-bit *execute mask* register (an SP3 syntax).
|
|
|
+ [exec_lo,exec_hi] 64-bit *execute mask* register (an SP3 syntax).
|
|
|
===================== =================================================================
|
|
|
|
|
|
High and low 32 bits of *execute mask* may be accessed as separate registers:
|
|
@@ -441,8 +450,8 @@ High and low 32 bits of *execute mask* may be accessed as separate registers:
|
|
|
===================== =================================================================
|
|
|
exec_lo Low 32 bits of *execute mask* register.
|
|
|
exec_hi High 32 bits of *execute mask* register.
|
|
|
- [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax).
|
|
|
- [exec_hi] High 32 bits of *execute mask* register (an alternative syntax).
|
|
|
+ [exec_lo] Low 32 bits of *execute mask* register (an SP3 syntax).
|
|
|
+ [exec_hi] High 32 bits of *execute mask* register (an SP3 syntax).
|
|
|
===================== =================================================================
|
|
|
|
|
|
.. _amdgpu_synid_vccz:
|
|
@@ -452,7 +461,7 @@ vccz
|
|
|
|
|
|
A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
|
|
|
|
|
|
-Note. When GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
|
|
|
+Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
|
|
|
|
|
|
.. _amdgpu_synid_execz:
|
|
|
|
|
@@ -461,7 +470,7 @@ execz
|
|
|
|
|
|
A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
|
|
|
|
|
|
-Note. When GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
|
|
|
+Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
|
|
|
|
|
|
.. _amdgpu_synid_scc:
|
|
|
|
|
@@ -495,19 +504,20 @@ GFX10 only.
|
|
|
|
|
|
.. _amdgpu_synid_constant:
|
|
|
|
|
|
-constant
|
|
|
---------
|
|
|
+inline constant
|
|
|
+---------------
|
|
|
+
|
|
|
+An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
|
|
|
+Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
|
|
|
|
|
|
-A set of integer and floating-point *inline* constants and values:
|
|
|
+Inline constants include:
|
|
|
|
|
|
* :ref:`iconst<amdgpu_synid_iconst>`
|
|
|
* :ref:`fconst<amdgpu_synid_fconst>`
|
|
|
* :ref:`ival<amdgpu_synid_ival>`
|
|
|
|
|
|
-In contrast with :ref:`literals<amdgpu_synid_literal>`, these operands are encoded as a part of instruction.
|
|
|
-
|
|
|
If a number may be encoded as either
|
|
|
-a :ref:`literal<amdgpu_synid_literal>` or
|
|
|
+a :ref:`literal<amdgpu_synid_literal>` or
|
|
|
a :ref:`constant<amdgpu_synid_constant>`,
|
|
|
assembler selects the latter encoding as more efficient.
|
|
|
|
|
@@ -516,17 +526,14 @@ assembler selects the latter encoding as more efficient.
|
|
|
iconst
|
|
|
~~~~~~
|
|
|
|
|
|
-An :ref:`integer number<amdgpu_synid_integer_number>`
|
|
|
+An :ref:`integer number<amdgpu_synid_integer_number>` or
|
|
|
+an :ref:`absolute expression<amdgpu_synid_absolute_expression>`
|
|
|
encoded as an *inline constant*.
|
|
|
|
|
|
Only a small fraction of integer numbers may be encoded as *inline constants*.
|
|
|
They are enumerated in the table below.
|
|
|
Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
|
|
|
|
|
|
-Integer *inline constants* are converted to
|
|
|
-:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
-as described :ref:`here<amdgpu_synid_int_const_conv>`.
|
|
|
-
|
|
|
================================== ====================================
|
|
|
Value Note
|
|
|
================================== ====================================
|
|
@@ -548,10 +555,6 @@ Only a small fraction of floating-point numbers may be encoded as *inline consta
|
|
|
They are enumerated in the table below.
|
|
|
Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
|
|
|
|
|
|
-Floating-point *inline constants* are converted to
|
|
|
-:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
-as described :ref:`here<amdgpu_synid_fp_const_conv>`.
|
|
|
-
|
|
|
===================== ===================================================== ==================
|
|
|
Value Note Availability
|
|
|
===================== ===================================================== ==================
|
|
@@ -594,21 +597,18 @@ These operands provide read-only access to H/W registers.
|
|
|
literal
|
|
|
-------
|
|
|
|
|
|
-A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
|
|
|
+A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
|
|
|
+Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
|
|
|
|
|
|
If a number may be encoded as either
|
|
|
-a :ref:`literal<amdgpu_synid_literal>` or
|
|
|
+a :ref:`literal<amdgpu_synid_literal>` or
|
|
|
an :ref:`inline constant<amdgpu_synid_constant>`,
|
|
|
assembler selects the latter encoding as more efficient.
|
|
|
|
|
|
Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
|
|
|
-:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
|
|
|
-:ref:`expressions<amdgpu_synid_expression>`
|
|
|
-(expressions are currently supported for 32-bit operands only).
|
|
|
-
|
|
|
-A 64-bit literal value is converted by assembler
|
|
|
-to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
-as described :ref:`here<amdgpu_synid_lit_conv>`.
|
|
|
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
|
|
|
+:ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
|
|
|
+:ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
|
|
|
|
|
|
An instruction may use only one literal but several operands may refer the same literal.
|
|
|
|
|
@@ -617,30 +617,38 @@ An instruction may use only one literal but several operands may refer the same
|
|
|
uimm8
|
|
|
-----
|
|
|
|
|
|
-A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
-The value is encoded as part of the opcode so it is free to use.
|
|
|
+A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
|
|
+or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
|
|
|
+The value must be in the range 0..0xFF.
|
|
|
|
|
|
.. _amdgpu_synid_uimm32:
|
|
|
|
|
|
uimm32
|
|
|
------
|
|
|
|
|
|
-A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
-The value is stored as a separate 32-bit dword in the instruction stream.
|
|
|
+A 32-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
|
|
+or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
|
|
|
+The value must be in the range 0..0xFFFFFFFF.
|
|
|
|
|
|
.. _amdgpu_synid_uimm20:
|
|
|
|
|
|
uimm20
|
|
|
------
|
|
|
|
|
|
-A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
+A 20-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
|
|
+or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
|
|
|
+
|
|
|
+The value must be in the range 0..0xFFFFF.
|
|
|
|
|
|
.. _amdgpu_synid_uimm21:
|
|
|
|
|
|
uimm21
|
|
|
------
|
|
|
|
|
|
-A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
+A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
|
|
+or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
|
|
|
+
|
|
|
+The value must be in the range 0..0x1FFFFF.
|
|
|
|
|
|
.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
|
|
|
|
|
@@ -649,7 +657,10 @@ A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
simm21
|
|
|
------
|
|
|
|
|
|
-A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
|
|
|
+A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
|
|
+or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
|
|
|
+
|
|
|
+The value must be in the range -0x100000..0x0FFFFF.
|
|
|
|
|
|
.. WARNING:: Assembler currently supports 20-bit unsigned offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
|
|
|
|
|
@@ -678,27 +689,20 @@ Integer Numbers
|
|
|
---------------
|
|
|
|
|
|
Integer numbers are 64 bits wide.
|
|
|
-They may be specified in binary, octal, hexadecimal and decimal formats:
|
|
|
-
|
|
|
- ============== ====================================
|
|
|
- Format Syntax
|
|
|
- ============== ====================================
|
|
|
- Decimal [-]?[1-9][0-9]*
|
|
|
- Binary [-]?0b[01]+
|
|
|
- Octal [-]?0[0-7]+
|
|
|
- Hexadecimal [-]?0x[0-9a-fA-F]+
|
|
|
- \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
|
|
|
- ============== ====================================
|
|
|
+They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
+as described :ref:`here<amdgpu_synid_int_conv>`.
|
|
|
|
|
|
-Examples:
|
|
|
+Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
|
|
|
|
|
|
-.. parsed-literal::
|
|
|
-
|
|
|
- -1234
|
|
|
- 0b1010
|
|
|
- 010
|
|
|
- 0xff
|
|
|
- 0ffh
|
|
|
+ ============ =============================== ========
|
|
|
+ Format Syntax Example
|
|
|
+ ============ =============================== ========
|
|
|
+ Decimal [-]?[1-9][0-9]* -1234
|
|
|
+ Binary [-]?0b[01]+ 0b1010
|
|
|
+ Octal [-]?0[0-7]+ 010
|
|
|
+ Hexadecimal [-]?0x[0-9a-fA-F]+ 0xff
|
|
|
+ \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 0ffh
|
|
|
+ ============ =============================== ========
|
|
|
|
|
|
.. _amdgpu_synid_floating-point_number:
|
|
|
|
|
@@ -706,31 +710,29 @@ Floating-Point Numbers
|
|
|
----------------------
|
|
|
|
|
|
All floating-point numbers are handled as double (64 bits wide).
|
|
|
+They are converted to
|
|
|
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
+as described :ref:`here<amdgpu_synid_fp_conv>`.
|
|
|
|
|
|
Floating-point numbers may be specified in hexadecimal and decimal formats:
|
|
|
|
|
|
- ============== ======================================================== ========================================================
|
|
|
- Format Syntax Note
|
|
|
- ============== ======================================================== ========================================================
|
|
|
- Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent.
|
|
|
- Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
|
|
|
- ============== ======================================================== ========================================================
|
|
|
-
|
|
|
-Examples:
|
|
|
-
|
|
|
-.. parsed-literal::
|
|
|
-
|
|
|
- -1.234
|
|
|
- 234e2
|
|
|
- -0x1afp-10
|
|
|
- 0x.1afp10
|
|
|
+ ============ ======================================================== ====================== ====================
|
|
|
+ Format Syntax Examples Note
|
|
|
+ ============ ======================================================== ====================== ====================
|
|
|
+ Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? -1.234, 234e2 Must include either
|
|
|
+ a decimal separator
|
|
|
+ or an exponent.
|
|
|
+ Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ -0x1afp-10, 0x.1afp10
|
|
|
+ ============ ======================================================== ====================== ====================
|
|
|
|
|
|
.. _amdgpu_synid_expression:
|
|
|
|
|
|
Expressions
|
|
|
===========
|
|
|
|
|
|
-An expression specifies an address or a numeric value.
|
|
|
+An expression is evaluated to a 64-bit integer.
|
|
|
+Note that floating-point expressions are not supported.
|
|
|
+
|
|
|
There are two kinds of expressions:
|
|
|
|
|
|
* :ref:`Absolute<amdgpu_synid_absolute_expression>`.
|
|
@@ -741,10 +743,14 @@ There are two kinds of expressions:
|
|
|
Absolute Expressions
|
|
|
--------------------
|
|
|
|
|
|
-The value of an absolute expression remains the same after program relocation.
|
|
|
+The value of an absolute expression does not change after program relocation.
|
|
|
Absolute expressions must not include unassigned and relocatable values
|
|
|
such as labels.
|
|
|
|
|
|
+Absolute expressions are evaluated to 64-bit integer values and converted to
|
|
|
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
+as described :ref:`here<amdgpu_synid_int_conv>`.
|
|
|
+
|
|
|
Examples:
|
|
|
|
|
|
.. parsed-literal::
|
|
@@ -760,45 +766,38 @@ Relocatable Expressions
|
|
|
The value of a relocatable expression depends on program relocation.
|
|
|
|
|
|
Note that use of relocatable expressions is limited with branch targets
|
|
|
-and 32-bit :ref:`literals<amdgpu_synid_literal>`.
|
|
|
+and 32-bit integer operands.
|
|
|
|
|
|
-Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
|
|
|
-
|
|
|
-Examples:
|
|
|
+A relocatable expression is evaluated to a 64-bit integer value
|
|
|
+which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
|
|
|
+of symbol(s) used in the expression. For example, if an instruction refers a label,
|
|
|
+this reference is evaluated to an offset from the address after the instruction
|
|
|
+to the label address:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
- y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
|
|
|
- z = .
|
|
|
-
|
|
|
-Expression Data Type
|
|
|
---------------------
|
|
|
-
|
|
|
-Expressions and operands of expressions are interpreted as 64-bit integers.
|
|
|
+ label:
|
|
|
+ v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4
|
|
|
|
|
|
-Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
|
|
|
-However these operands are also handled as 64-bit integers
|
|
|
-using binary representation of specified floating-point numbers.
|
|
|
-No conversion from floating-point to integer is performed.
|
|
|
-
|
|
|
-Examples:
|
|
|
+Note that values of relocatable expressions are usually unknown at assembly time;
|
|
|
+they are resolved later by a linker and converted to
|
|
|
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
|
|
+as described :ref:`here<amdgpu_synid_rl_conv>`.
|
|
|
|
|
|
-.. parsed-literal::
|
|
|
+Operands and Operations
|
|
|
+-----------------------
|
|
|
|
|
|
- x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
|
|
|
- y = x + x // y is a sum of two integer values; it is not equal to 0.2!
|
|
|
+Expressions are composed of 64-bit integer operands and operations.
|
|
|
+Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
|
|
|
+and :ref:`symbols<amdgpu_synid_symbol>`.
|
|
|
|
|
|
-Syntax
|
|
|
-------
|
|
|
+Expressions may also use "." which is a reference to the current PC (program counter).
|
|
|
|
|
|
-Expressions are composed of
|
|
|
-:ref:`symbols<amdgpu_synid_symbol>`,
|
|
|
-:ref:`integer numbers<amdgpu_synid_integer_number>`,
|
|
|
-:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
|
|
|
-:ref:`binary operators<amdgpu_synid_expression_bin_op>`,
|
|
|
-:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
|
|
|
+:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
|
|
|
+operations produce 64-bit integer results.
|
|
|
|
|
|
-Expressions may also use "." which is a reference to the current PC (program counter).
|
|
|
+Syntax of Expressions
|
|
|
+---------------------
|
|
|
|
|
|
The syntax of expressions is shown below::
|
|
|
|
|
@@ -887,7 +886,7 @@ They operate on and produce 64-bit integers.
|
|
|
Symbols
|
|
|
-------
|
|
|
|
|
|
-A symbol is a named 64-bit value, representing a relocatable
|
|
|
+A symbol is a named 64-bit integer value, representing a relocatable
|
|
|
address or an absolute (non-relocatable) number.
|
|
|
|
|
|
Symbol names have the following syntax:
|
|
@@ -907,128 +906,78 @@ The table below provides several examples of syntax used for symbol definition.
|
|
|
A symbol may be used before it is declared or assigned;
|
|
|
unassigned symbols are assumed to be PC-relative.
|
|
|
|
|
|
-Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
|
|
|
+Additional information about symbols may be found :ref:`here<amdgpu-symbols>`.
|
|
|
|
|
|
.. _amdgpu_synid_conv:
|
|
|
|
|
|
-Conversions
|
|
|
-===========
|
|
|
+Type and Size Conversion
|
|
|
+========================
|
|
|
|
|
|
This section describes what happens when a 64-bit
|
|
|
:ref:`integer number<amdgpu_synid_integer_number>`, a
|
|
|
-:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
|
|
|
-:ref:`symbol<amdgpu_synid_symbol>`
|
|
|
+:ref:`floating-point number<amdgpu_synid_floating-point_number>` or an
|
|
|
+:ref:`expression<amdgpu_synid_expression>`
|
|
|
is used for an operand which has a different type or size.
|
|
|
|
|
|
-Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
|
|
|
-
|
|
|
-* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
|
|
|
-* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
|
|
|
-
|
|
|
-.. _amdgpu_synid_const_conv:
|
|
|
-
|
|
|
-Inline Constants
|
|
|
-----------------
|
|
|
-
|
|
|
-.. _amdgpu_synid_int_const_conv:
|
|
|
-
|
|
|
-Integer Inline Constants
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
-
|
|
|
-Integer :ref:`inline constants<amdgpu_synid_constant>`
|
|
|
-may be thought of as 64-bit
|
|
|
-:ref:`integer numbers<amdgpu_synid_integer_number>`;
|
|
|
-when used as operands they are truncated to the size of
|
|
|
-:ref:`expected operand type<amdgpu_syn_instruction_type>`.
|
|
|
-No data type conversions are performed.
|
|
|
-
|
|
|
-Examples:
|
|
|
-
|
|
|
-.. parsed-literal::
|
|
|
-
|
|
|
- // GFX9
|
|
|
-
|
|
|
- v_add_u16 v0, -1, 0 // v0 = 0xFFFF
|
|
|
- v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN)
|
|
|
-
|
|
|
- v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF
|
|
|
- v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN)
|
|
|
+.. _amdgpu_synid_int_conv:
|
|
|
|
|
|
-.. _amdgpu_synid_fp_const_conv:
|
|
|
+Conversion of Integer Values
|
|
|
+----------------------------
|
|
|
|
|
|
-Floating-Point Inline Constants
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
|
|
|
+:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
|
|
|
+the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
|
|
|
|
|
|
-Floating-point :ref:`inline constants<amdgpu_synid_constant>`
|
|
|
-may be thought of as 64-bit
|
|
|
-:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
|
|
|
-when used as operands they are converted to a floating-point number of
|
|
|
-:ref:`expected operand size<amdgpu_syn_instruction_type>`.
|
|
|
+1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
|
|
|
+(see the table below). There are two cases when this operation is enabled:
|
|
|
|
|
|
-Examples:
|
|
|
-
|
|
|
-.. parsed-literal::
|
|
|
-
|
|
|
- // GFX9
|
|
|
-
|
|
|
- v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0)
|
|
|
- v_add_u16 v0, 1.0, 0 // v0 = 0x3C00
|
|
|
-
|
|
|
- v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0)
|
|
|
- v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000
|
|
|
-
|
|
|
-
|
|
|
-.. _amdgpu_synid_lit_conv:
|
|
|
-
|
|
|
-Literals
|
|
|
---------
|
|
|
+ * The truncated bits are all 0.
|
|
|
+ * The truncated bits are all 1 and the value after truncation has its MSB bit set.
|
|
|
|
|
|
-.. _amdgpu_synid_int_lit_conv:
|
|
|
+In all other cases assembler triggers an error.
|
|
|
|
|
|
-Integer Literals
|
|
|
-~~~~~~~~~~~~~~~~
|
|
|
+2. *Conversion*. The input value is converted to the expected type as described in the table below.
|
|
|
+Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
|
|
|
|
|
|
-Integer :ref:`literals<amdgpu_synid_literal>`
|
|
|
-are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
|
|
|
+ ============== ================= =============== ====================================================================
|
|
|
+ Expected type Truncation Width Conversion Description
|
|
|
+ ============== ================= =============== ====================================================================
|
|
|
+ i16, u16, b16 16 num.u16 Truncate to 16 bits.
|
|
|
+ i32, u32, b32 32 num.u32 Truncate to 32 bits.
|
|
|
+ i64 32 {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
|
|
|
+ u64, b64 32 {0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
|
|
|
+ f16 16 num.u16 Use low 16 bits as an f16 value.
|
|
|
+ f32 32 num.u32 Use low 32 bits as an f32 value.
|
|
|
+ f64 32 {num.u32,0} Use low 32 bits of the number as high 32 bits
|
|
|
+ of the result; low 32 bits of the result are zeroed.
|
|
|
+ ============== ================= =============== ====================================================================
|
|
|
|
|
|
-When used as operands they are converted to
|
|
|
-:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
|
|
|
-
|
|
|
- ============== ============== =============== ====================================================================
|
|
|
- Expected type Condition Result Note
|
|
|
- ============== ============== =============== ====================================================================
|
|
|
- i16, u16, b16 cond(num,16) num.u16 Truncate to 16 bits.
|
|
|
- i32, u32, b32 cond(num,32) num.u32 Truncate to 32 bits.
|
|
|
- i64 cond(num,32) {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
|
|
|
- u64, b64 cond(num,32) { 0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
|
|
|
- f16 cond(num,16) num.u16 Use low 16 bits as an f16 value.
|
|
|
- f32 cond(num,32) num.u32 Use low 32 bits as an f32 value.
|
|
|
- f64 cond(num,32) {num.u32,0} Use low 32 bits of the number as high 32 bits
|
|
|
- of the result; low 32 bits of the result are zeroed.
|
|
|
- ============== ============== =============== ====================================================================
|
|
|
-
|
|
|
-The condition *cond(X,S)* indicates if a 64-bit number *X*
|
|
|
-can be converted to a smaller size *S* by truncation of upper bits.
|
|
|
-There are two cases when the conversion is possible:
|
|
|
-
|
|
|
-* The truncated bits are all 0.
|
|
|
-* The truncated bits are all 1 and the value after truncation has its MSB bit set.
|
|
|
-
|
|
|
-Examples of valid literals:
|
|
|
+Examples of enabled conversions:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
// GFX9
|
|
|
- // Literal value after conversion:
|
|
|
- v_add_u16 v0, 0xff00, v0 // 0xff00
|
|
|
- v_add_u16 v0, 0xffffffffffffff00, v0 // 0xff00
|
|
|
- v_add_u16 v0, -256, v0 // 0xff00
|
|
|
- // Literal value after conversion:
|
|
|
- s_bfe_i64 s[0:1], 0xffefffff, s3 // 0xffffffffffefffff
|
|
|
- s_bfe_u64 s[0:1], 0xffefffff, s3 // 0x00000000ffefffff
|
|
|
- v_ceil_f64_e32 v[0:1], 0xffefffff // 0xffefffff00000000 (-1.7976922776554302e308)
|
|
|
|
|
|
-Examples of invalid literals:
|
|
|
+ v_add_u16 v0, -1, 0 // src0 = 0xFFFF
|
|
|
+ v_add_f16 v0, -1, 0 // src0 = 0xFFFF (NaN)
|
|
|
+ //
|
|
|
+ v_add_u32 v0, -1, 0 // src0 = 0xFFFFFFFF
|
|
|
+ v_add_f32 v0, -1, 0 // src0 = 0xFFFFFFFF (NaN)
|
|
|
+ //
|
|
|
+ v_add_u16 v0, 0xff00, v0 // src0 = 0xff00
|
|
|
+ v_add_u16 v0, 0xffffffffffffff00, v0 // src0 = 0xff00
|
|
|
+ v_add_u16 v0, -256, v0 // src0 = 0xff00
|
|
|
+ //
|
|
|
+ s_bfe_i64 s[0:1], 0xffefffff, s3 // src0 = 0xffffffffffefffff
|
|
|
+ s_bfe_u64 s[0:1], 0xffefffff, s3 // src0 = 0x00000000ffefffff
|
|
|
+ v_ceil_f64_e32 v[0:1], 0xffefffff // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
|
|
|
+ //
|
|
|
+ x = 0xffefffff //
|
|
|
+ s_bfe_i64 s[0:1], x, s3 // src0 = 0xffffffffffefffff
|
|
|
+ s_bfe_u64 s[0:1], x, s3 // src0 = 0x00000000ffefffff
|
|
|
+ v_ceil_f64_e32 v[0:1], x // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
|
|
|
+
|
|
|
+Examples of disabled conversions:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
@@ -1037,49 +986,57 @@ Examples of invalid literals:
|
|
|
v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1
|
|
|
v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result
|
|
|
|
|
|
-.. _amdgpu_synid_fp_lit_conv:
|
|
|
+.. _amdgpu_synid_fp_conv:
|
|
|
|
|
|
-Floating-Point Literals
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+Conversion of Floating-Point Values
|
|
|
+-----------------------------------
|
|
|
|
|
|
-Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
|
|
|
-:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
|
|
|
+Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
|
|
|
+These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
|
|
|
|
|
|
-When used as operands they are converted to
|
|
|
-:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
|
|
|
+1. *Validation*. Assembler checks if the input f64 number can be converted
|
|
|
+to the *required floating-point type* (see the table below) without overflow or underflow.
|
|
|
+Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
|
|
|
|
|
|
- ============== ============== ================= =================================================================
|
|
|
- Expected type Condition Result Note
|
|
|
- ============== ============== ================= =================================================================
|
|
|
- i16, u16, b16 cond(num,16) f16(num) Convert to f16 and use bits of the result as an integer value.
|
|
|
- i32, u32, b32 cond(num,32) f32(num) Convert to f32 and use bits of the result as an integer value.
|
|
|
- i64, u64, b64 false \- Conversion disabled because of an unclear semantics.
|
|
|
- f16 cond(num,16) f16(num) Convert to f16.
|
|
|
- f32 cond(num,32) f32(num) Convert to f32.
|
|
|
- f64 true {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
|
|
|
- zero-fill low 32 bits of the result.
|
|
|
+2. *Conversion*. The input value is converted to the expected type as described in the table below.
|
|
|
+Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
|
|
|
|
|
|
- Note that the result may differ from the original number.
|
|
|
- ============== ============== ================= =================================================================
|
|
|
+ ============== ================ ================= =================================================================
|
|
|
+ Expected type Required FP Type Conversion Description
|
|
|
+ ============== ================ ================= =================================================================
|
|
|
+ i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value.
|
|
|
+ i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value.
|
|
|
+ i64, u64, b64 \- \- Conversion disabled.
|
|
|
+ f16 f16 f16(num) Convert to f16.
|
|
|
+ f32 f32 f32(num) Convert to f32.
|
|
|
+ f64 f64 {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result;
|
|
|
+ zero-fill low 32 bits of the result.
|
|
|
|
|
|
-The condition *cond(X,S)* indicates if an f64 number *X* can be converted
|
|
|
-to a smaller *S*-bit floating-point type without overflow or underflow.
|
|
|
-Precision lost is allowed.
|
|
|
+ Note that the result may differ from the original number.
|
|
|
+ ============== ================ ================= =================================================================
|
|
|
|
|
|
-Examples of valid literals:
|
|
|
+Examples of enabled conversions:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
// GFX9
|
|
|
|
|
|
- v_add_f16 v1, 65500.0, v2
|
|
|
- v_add_f32 v1, 65600.0, v2
|
|
|
+ v_add_f16 v0, 1.0, 0 // src0 = 0x3C00 (1.0)
|
|
|
+ v_add_u16 v0, 1.0, 0 // src0 = 0x3C00
|
|
|
+ //
|
|
|
+ v_add_f32 v0, 1.0, 0 // src0 = 0x3F800000 (1.0)
|
|
|
+ v_add_u32 v0, 1.0, 0 // src0 = 0x3F800000
|
|
|
|
|
|
- // Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff)
|
|
|
- // Literal value after conversion: 1.7976922776554302e308 (0x7fefffff00000000)
|
|
|
+ // src0 before conversion:
|
|
|
+ // 1.7976931348623157e308 = 0x7fefffffffffffff
|
|
|
+ // src0 after conversion:
|
|
|
+ // 1.7976922776554302e308 = 0x7fefffff00000000
|
|
|
v_ceil_f64 v[0:1], 1.7976931348623157e308
|
|
|
|
|
|
-Examples of invalid literals:
|
|
|
+ v_add_f16 v1, 65500.0, v2 // ok for f16.
|
|
|
+ v_add_f32 v1, 65600.0, v2 // ok for f32, but would result in overflow for f16.
|
|
|
+
|
|
|
+Examples of disabled conversions:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
@@ -1087,25 +1044,35 @@ Examples of invalid literals:
|
|
|
|
|
|
v_add_f16 v1, 65600.0, v2 // overflow
|
|
|
|
|
|
-.. _amdgpu_synid_exp_conv:
|
|
|
+.. _amdgpu_synid_rl_conv:
|
|
|
|
|
|
-Expressions
|
|
|
-~~~~~~~~~~~
|
|
|
+Conversion of Relocatable Values
|
|
|
+--------------------------------
|
|
|
|
|
|
-Expressions operate with and result in 64-bit integers.
|
|
|
+:ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>`
|
|
|
+may be used with 32-bit integer operands and jump targets.
|
|
|
|
|
|
-When used as operands they are truncated to
|
|
|
-:ref:`expected operand size<amdgpu_syn_instruction_type>`.
|
|
|
-No data type conversions are performed.
|
|
|
+When the value of a relocatable expression is resolved by a linker, it is
|
|
|
+converted as needed and truncated to the operand size. The conversion depends
|
|
|
+on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
|
|
|
|
|
|
-Examples:
|
|
|
+For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
|
|
|
+this reference is evaluated to a 64-bit offset from the address after the
|
|
|
+instruction to the address being referenced, *counted in bytes*.
|
|
|
+Then the value is truncated to 32 bits and encoded as a literal:
|
|
|
|
|
|
.. parsed-literal::
|
|
|
|
|
|
- // GFX9
|
|
|
+ expr = .
|
|
|
+ v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4
|
|
|
+ // and then truncated to 0xFFFFFFFC
|
|
|
|
|
|
- x = 0.1
|
|
|
- v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)]
|
|
|
- v_sqrt_f32 v0, (0.1 + 0) // the same as above
|
|
|
- v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float]
|
|
|
+As another example, when a branch instruction refers a label,
|
|
|
+this reference is evaluated to an offset from the address after the
|
|
|
+instruction to the label address, *counted in dwords*.
|
|
|
+Then the value is truncated to 16 bits:
|
|
|
+
|
|
|
+.. parsed-literal::
|
|
|
|
|
|
+ label:
|
|
|
+ s_branch label // 'label' operand is evaluated to -1 and truncated to 0xFFFF
|