12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583 |
- ======================================
- Syntax of AMDGPU Instruction Modifiers
- ======================================
- .. contents::
- :local:
- Conventions
- ===========
- The following notation is used throughout this document:
- =================== =============================================================
- Notation Description
- =================== =============================================================
- {0..N} Any integer value in the range from 0 to N (inclusive).
- <x> Syntax and meaning of *x* is explained elsewhere.
- =================== =============================================================
- .. _amdgpu_syn_modifiers:
- Modifiers
- =========
- DS Modifiers
- ------------
- .. _amdgpu_synid_ds_offset8:
- offset8
- ~~~~~~~
- Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
- Used with DS instructions which have 2 addresses.
- =================== ====================================================================
- Syntax Description
- =================== ====================================================================
- offset:{0..0xFF} Specifies an unsigned 8-bit offset as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- =================== ====================================================================
- Examples:
- .. parsed-literal::
- offset:0xff
- offset:2-x
- offset:-x-y
- .. _amdgpu_synid_ds_offset16:
- offset16
- ~~~~~~~~
- Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
- Used with DS instructions which have 1 address.
- ==================== ====================================================================
- Syntax Description
- ==================== ====================================================================
- offset:{0..0xFFFF} Specifies an unsigned 16-bit offset as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- ==================== ====================================================================
- Examples:
- .. parsed-literal::
- offset:65535
- offset:0xffff
- offset:-x-y
- .. _amdgpu_synid_sw_offset16:
- swizzle pattern
- ~~~~~~~~~~~~~~~
- This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
- It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
- See AMD documentation for more information.
- ======================================================= ===========================================================
- Syntax Description
- ======================================================= ===========================================================
- offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern.
- offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern
- Each number is a lane *id*.
- offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern.
- The pattern converts a 5-bit lane *id* to another
- lane *id* with which the lane interacts.
- *mask* is a 5 character sequence which
- specifies how to transform the bits of the
- lane *id*.
- The following characters are allowed:
- * "0" - set bit to 0.
- * "1" - set bit to 1.
- * "p" - preserve bit.
- * "i" - inverse bit.
- offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode.
- Broadcasts the value of any particular lane to
- all lanes in its group.
- The first numeric parameter is a group
- size and must be equal to 2, 4, 8, 16 or 32.
- The second numeric parameter is an index of the
- lane being broadcasted.
- The index must not exceed group size.
- offset:swizzle(SWAP,{1..16}) Specifies a swap mode.
- Swaps the neighboring groups of
- 1, 2, 4, 8 or 16 lanes.
- offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode.
- Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
- ======================================================= ===========================================================
- Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- offset:255
- offset:0xffff
- offset:swizzle(QUAD_PERM, 0, 1, 2, 3)
- offset:swizzle(BITMASK_PERM, "01pi0")
- offset:swizzle(BROADCAST, 2, 0)
- offset:swizzle(SWAP, 8)
- offset:swizzle(REVERSE, 30 + 2)
- .. _amdgpu_synid_gds:
- gds
- ~~~
- Specifies whether to use GDS or LDS memory (LDS is the default).
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- gds Use GDS memory.
- ======================================== ================================================
- EXP Modifiers
- -------------
- .. _amdgpu_synid_done:
- done
- ~~~~
- Specifies if this is the last export from the shader to the target. By default,
- *exp* instruction does not finish an export sequence.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- done Indicates the last export operation.
- ======================================== ================================================
- .. _amdgpu_synid_compr:
- compr
- ~~~~~
- Indicates if the data are compressed (data are not compressed by default).
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- compr Data are compressed.
- ======================================== ================================================
- .. _amdgpu_synid_vm:
- vm
- ~~
- Specifies valid mask flag state (off by default).
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- vm Set valid mask flag.
- ======================================== ================================================
- FLAT Modifiers
- --------------
- .. _amdgpu_synid_flat_offset12:
- offset12
- ~~~~~~~~
- Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
- Cannot be used with *global/scratch* opcodes. GFX9 only.
- ================= ====================================================================
- Syntax Description
- ================= ====================================================================
- offset:{0..4095} Specifies a 12-bit unsigned offset as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- ================= ====================================================================
- Examples:
- .. parsed-literal::
- offset:4095
- offset:x-0xff
- .. _amdgpu_synid_flat_offset13s:
- offset13s
- ~~~~~~~~~
- Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
- Can be used with *global/scratch* opcodes only. GFX9 only.
- ===================== ====================================================================
- Syntax Description
- ===================== ====================================================================
- offset:{-4096..4095} Specifies a 13-bit signed offset as an
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- ===================== ====================================================================
- Examples:
- .. parsed-literal::
- offset:-4000
- offset:0x10
- offset:-x
- .. _amdgpu_synid_flat_offset12s:
- offset12s
- ~~~~~~~~~
- Specifies an immediate signed 12-bit offset, in bytes. The default value is 0.
- Can be used with *global/scratch* opcodes only.
- GFX10 only.
- ===================== ====================================================================
- Syntax Description
- ===================== ====================================================================
- offset:{-2048..2047} Specifies a 12-bit signed offset as an
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- ===================== ====================================================================
- Examples:
- .. parsed-literal::
- offset:-2000
- offset:0x10
- offset:-x+y
- .. _amdgpu_synid_flat_offset11:
- offset11
- ~~~~~~~~
- Specifies an immediate unsigned 11-bit offset, in bytes. The default value is 0.
- Cannot be used with *global/scratch* opcodes.
- GFX10 only.
- ================= ====================================================================
- Syntax Description
- ================= ====================================================================
- offset:{0..2047} Specifies an 11-bit unsigned offset as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- ================= ====================================================================
- Examples:
- .. parsed-literal::
- offset:2047
- offset:x+0xff
- dlc
- ~~~
- See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
- glc
- ~~~
- See a description :ref:`here<amdgpu_synid_glc>`.
- lds
- ~~~
- See a description :ref:`here<amdgpu_synid_lds>`. GFX10 only.
- slc
- ~~~
- See a description :ref:`here<amdgpu_synid_slc>`.
- tfe
- ~~~
- See a description :ref:`here<amdgpu_synid_tfe>`.
- nv
- ~~
- See a description :ref:`here<amdgpu_synid_nv>`.
- MIMG Modifiers
- --------------
- .. _amdgpu_synid_dmask:
- dmask
- ~~~~~
- Specifies which channels (image components) are used by the operation. By default, no channels
- are used.
- =============== ====================================================================
- Syntax Description
- =============== ====================================================================
- dmask:{0..15} Specifies image channels as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- Each bit corresponds to one of 4 image components (RGBA).
- If the specified bit value is 0, the component is not used,
- value 1 means that the component is used.
- =============== ====================================================================
- This modifier has some limitations depending on instruction kind:
- =================================================== ========================
- Instruction Kind Valid dmask Values
- =================================================== ========================
- 32-bit atomic *cmpswap* 0x3
- 32-bit atomic instructions except for *cmpswap* 0x1
- 64-bit atomic *cmpswap* 0xF
- 64-bit atomic instructions except for *cmpswap* 0x3
- *gather4* 0x1, 0x2, 0x4, 0x8
- Other instructions any value
- =================================================== ========================
- Examples:
- .. parsed-literal::
- dmask:0xf
- dmask:0b1111
- dmask:x|y|z
- .. _amdgpu_synid_unorm:
- unorm
- ~~~~~
- Specifies whether the address is normalized or not (the address is normalized by default).
- ======================== ========================================
- Syntax Description
- ======================== ========================================
- unorm Force the address to be unnormalized.
- ======================== ========================================
- glc
- ~~~
- See a description :ref:`here<amdgpu_synid_glc>`.
- slc
- ~~~
- See a description :ref:`here<amdgpu_synid_slc>`.
- .. _amdgpu_synid_r128:
- r128
- ~~~~
- Specifies texture resource size. The default size is 256 bits.
- GFX7, GFX8 and GFX10 only.
- =================== ================================================
- Syntax Description
- =================== ================================================
- r128 Specifies 128 bits texture resource size.
- =================== ================================================
- .. WARNING:: Using this modifier should descrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature.
- tfe
- ~~~
- See a description :ref:`here<amdgpu_synid_tfe>`.
- .. _amdgpu_synid_lwe:
- lwe
- ~~~
- Specifies LOD warning status (LOD warning is disabled by default).
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- lwe Enables LOD warning.
- ======================================== ================================================
- .. _amdgpu_synid_da:
- da
- ~~
- Specifies if an array index must be sent to TA. By default, array index is not sent.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- da Send an array-index to TA.
- ======================================== ================================================
- .. _amdgpu_synid_d16:
- d16
- ~~~
- Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- d16 Enables 16-bits data mode.
- On loads, convert data in memory to 16-bit
- format before storing it in VGPRs.
- For stores, convert 16-bit data in VGPRs to
- 32 bits before going to memory.
- Note that GFX8.0 does not support data packing.
- Each 16-bit data element occupies 1 VGPR.
- GFX8.1, GFX9 and GFX10 support data packing.
- Each pair of 16-bit data elements
- occupies 1 VGPR.
- ======================================== ================================================
- .. _amdgpu_synid_a16:
- a16
- ~~~
- Specifies size of image address components: 16 or 32 bits (32 bits by default).
- GFX9 and GFX10 only.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- a16 Enables 16-bits image address components.
- ======================================== ================================================
- .. _amdgpu_synid_dim:
- dim
- ~~~
- Specifies surface dimension. This is a mandatory modifier. There is no default value.
- GFX10 only.
- =============================== =========================================================
- Syntax Description
- =============================== =========================================================
- dim:1D One-dimensional image.
- dim:2D Two-dimensional image.
- dim:3D Three-dimensional image.
- dim:CUBE Cubemap array.
- dim:1D_ARRAY One-dimensional image array.
- dim:2D_ARRAY Two-dimensional image array.
- dim:2D_MSAA Two-dimensional multi-sample auto-aliasing image.
- dim:2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array.
- =============================== =========================================================
- The following table defines an alternative syntax which is supported
- for compatibility with SP3 assembler:
- =============================== =========================================================
- Syntax Description
- =============================== =========================================================
- dim:SQ_RSRC_IMG_1D One-dimensional image.
- dim:SQ_RSRC_IMG_2D Two-dimensional image.
- dim:SQ_RSRC_IMG_3D Three-dimensional image.
- dim:SQ_RSRC_IMG_CUBE Cubemap array.
- dim:SQ_RSRC_IMG_1D_ARRAY One-dimensional image array.
- dim:SQ_RSRC_IMG_2D_ARRAY Two-dimensional image array.
- dim:SQ_RSRC_IMG_2D_MSAA Two-dimensional multi-sample auto-aliasing image.
- dim:SQ_RSRC_IMG_2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array.
- =============================== =========================================================
- dlc
- ~~~
- See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
- Miscellaneous Modifiers
- -----------------------
- .. _amdgpu_synid_dlc:
- dlc
- ~~~
- Controls device level cache policy for memory operations. Used for synchronization.
- When specified, forces operation to bypass device level cache making the operation device
- level coherent. By default, instructions use device level cache.
- GFX10 only.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- dlc Bypass device level cache.
- ======================================== ================================================
- .. _amdgpu_synid_glc:
- glc
- ~~~
- This modifier has different meaning for loads, stores, and atomic operations.
- The default value is off (0).
- See AMD documentation for details.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- glc Set glc bit to 1.
- ======================================== ================================================
- .. _amdgpu_synid_lds:
- lds
- ~~~
- Specifies where to store the result: VGPRs or LDS (VGPRs by default).
- ======================================== ===========================
- Syntax Description
- ======================================== ===========================
- lds Store result in LDS.
- ======================================== ===========================
- .. _amdgpu_synid_nv:
- nv
- ~~
- Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
- GFX9 only.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- nv Indicates that instruction operates on
- non-volatile memory.
- ======================================== ================================================
- .. _amdgpu_synid_slc:
- slc
- ~~~
- Specifies cache policy. The default value is off (0).
- See AMD documentation for details.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- slc Set slc bit to 1.
- ======================================== ================================================
- .. _amdgpu_synid_tfe:
- tfe
- ~~~
- Controls access to partially resident textures. The default value is off (0).
- See AMD documentation for details.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- tfe Set tfe bit to 1.
- ======================================== ================================================
- MUBUF/MTBUF Modifiers
- ---------------------
- .. _amdgpu_synid_idxen:
- idxen
- ~~~~~
- Specifies whether address components include an index. By default, no components are used.
- Can be used together with :ref:`offen<amdgpu_synid_offen>`.
- Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- idxen Address components include an index.
- ======================================== ================================================
- .. _amdgpu_synid_offen:
- offen
- ~~~~~
- Specifies whether address components include an offset. By default, no components are used.
- Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
- Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offen Address components include an offset.
- ======================================== ================================================
- .. _amdgpu_synid_addr64:
- addr64
- ~~~~~~
- Specifies whether a 64-bit address is used. By default, no address is used.
- GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
- :ref:`idxen<amdgpu_synid_idxen>` modifiers.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- addr64 A 64-bit address is used.
- ======================================== ================================================
- .. _amdgpu_synid_buf_offset12:
- offset12
- ~~~~~~~~
- Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
- ================== ====================================================================
- Syntax Description
- ================== ====================================================================
- offset:{0..0xFFF} Specifies a 12-bit unsigned offset as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- ================== ====================================================================
- Examples:
- .. parsed-literal::
- offset:x+y
- offset:0x10
- glc
- ~~~
- See a description :ref:`here<amdgpu_synid_glc>`.
- slc
- ~~~
- See a description :ref:`here<amdgpu_synid_slc>`.
- lds
- ~~~
- See a description :ref:`here<amdgpu_synid_lds>`.
- dlc
- ~~~
- See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
- tfe
- ~~~
- See a description :ref:`here<amdgpu_synid_tfe>`.
- .. _amdgpu_synid_dfmt:
- dfmt
- ~~~~
- TBD
- .. _amdgpu_synid_nfmt:
- nfmt
- ~~~~
- TBD
- SMRD/SMEM Modifiers
- -------------------
- glc
- ~~~
- See a description :ref:`here<amdgpu_synid_glc>`.
- nv
- ~~
- See a description :ref:`here<amdgpu_synid_nv>`. GFX9 only.
- dlc
- ~~~
- See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
- VINTRP Modifiers
- ----------------
- .. _amdgpu_synid_high:
- high
- ~~~~
- Specifies which half of the LDS word to use. Low half of LDS word is used by default.
- GFX9 and GFX10 only.
- ======================================== ================================
- Syntax Description
- ======================================== ================================
- high Use high half of LDS word.
- ======================================== ================================
- DPP8 Modifiers
- --------------
- GFX10 only.
- .. _amdgpu_synid_dpp8_sel:
- dpp8_sel
- ~~~~~~~~
- Selects which lanes to pull data from, within a group of 8 lanes. This is a mandatory modifier.
- There is no default value.
- GFX10 only.
- The *dpp8_sel* modifier must specify exactly 8 values.
- First value selects which lane to read from to supply data into lane 0.
- Second value controls lane 1 and so on.
- Each value may be specified as either
- an :ref:`integer number<amdgpu_synid_integer_number>` or
- an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- =============================================================== ===========================
- Syntax Description
- =============================================================== ===========================
- dpp8:[{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7}] Select lanes to read from.
- =============================================================== ===========================
- Examples:
- .. parsed-literal::
- dpp8:[7,6,5,4,3,2,1,0]
- dpp8:[0,1,0,1,0,1,0,1]
- .. _amdgpu_synid_fi8:
- fi
- ~~
- Controls interaction with inactive lanes for *dpp8* instructions. The default value is zero.
- Note: *inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
- GFX10 only.
- ==================================== =====================================================
- Syntax Description
- ==================================== =====================================================
- fi:0 Fetch zero when accessing data from inactive lanes.
- fi:1 Fetch pre-exist values from inactive lanes.
- ==================================== =====================================================
- Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- DPP/DPP16 Modifiers
- -------------------
- GFX8, GFX9 and GFX10 only.
- .. _amdgpu_synid_dpp_ctrl:
- dpp_ctrl
- ~~~~~~~~
- Specifies how data are shared between threads. This is a mandatory modifier.
- There is no default value.
- GFX8 and GFX9 only. Use :ref:`dpp16_ctrl<amdgpu_synid_dpp16_ctrl>` for GFX10.
- Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
- row_mirror Mirror threads within row.
- row_half_mirror Mirror threads within 1/2 row (8 threads).
- row_bcast:15 Broadcast 15th thread of each row to next row.
- row_bcast:31 Broadcast thread 31 to rows 2 and 3.
- wave_shl:1 Wavefront left shift by 1 thread.
- wave_rol:1 Wavefront left rotate by 1 thread.
- wave_shr:1 Wavefront right shift by 1 thread.
- wave_ror:1 Wavefront right rotate by 1 thread.
- row_shl:{1..15} Row shift left by 1-15 threads.
- row_shr:{1..15} Row shift right by 1-15 threads.
- row_ror:{1..15} Row rotate right by 1-15 threads.
- ======================================== ================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- quad_perm:[0, 1, 2, 3]
- row_shl:3
- .. _amdgpu_synid_dpp16_ctrl:
- dpp16_ctrl
- ~~~~~~~~~~
- Specifies how data are shared between threads. This is a mandatory modifier.
- There is no default value.
- GFX10 only. Use :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` for GFX8 and GFX9.
- Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
- (There are only two rows in *wave32* mode.)
- ======================================== ====================================================
- Syntax Description
- ======================================== ====================================================
- quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
- row_mirror Mirror threads within row.
- row_half_mirror Mirror threads within 1/2 row (8 threads).
- row_share:{0..15} Share the value from the specified lane with other
- lanes in the row.
- row_xmask:{0..15} Fetch from XOR(current lane id, specified lane id).
- row_shl:{1..15} Row shift left by 1-15 threads.
- row_shr:{1..15} Row shift right by 1-15 threads.
- row_ror:{1..15} Row rotate right by 1-15 threads.
- ======================================== ====================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- quad_perm:[0, 1, 2, 3]
- row_shl:3
- .. _amdgpu_synid_row_mask:
- row_mask
- ~~~~~~~~
- Controls which rows are enabled for data sharing. By default, all rows are enabled.
- Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
- (There are only two rows in *wave32* mode.)
- ================= ====================================================================
- Syntax Description
- ================= ====================================================================
- row_mask:{0..15} Specifies a *row mask* as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- Each of 4 bits in the mask controls one row
- (0 - disabled, 1 - enabled).
- In *wave32* mode the values should be limited to 0..7.
- ================= ====================================================================
- Examples:
- .. parsed-literal::
- row_mask:0xf
- row_mask:0b1010
- row_mask:x|y
- .. _amdgpu_synid_bank_mask:
- bank_mask
- ~~~~~~~~~
- Controls which banks are enabled for data sharing. By default, all banks are enabled.
- Note: the lanes of a wavefront are organized in four *rows* and four *banks*.
- (There are only two rows in *wave32* mode.)
- ================== ====================================================================
- Syntax Description
- ================== ====================================================================
- bank_mask:{0..15} Specifies a *bank mask* as a positive
- :ref:`integer number <amdgpu_synid_integer_number>`
- or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
- Each of 4 bits in the mask controls one bank
- (0 - disabled, 1 - enabled).
- ================== ====================================================================
- Examples:
- .. parsed-literal::
- bank_mask:0x3
- bank_mask:0b0011
- bank_mask:x&y
- .. _amdgpu_synid_bound_ctrl:
- bound_ctrl
- ~~~~~~~~~~
- Controls data sharing when accessing an invalid lane. By default, data sharing with
- invalid lanes is disabled.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- bound_ctrl:0 Enables data sharing with invalid lanes.
- Accessing data from an invalid lane will
- return zero.
- ======================================== ================================================
- .. _amdgpu_synid_fi16:
- fi
- ~~
- Controls interaction with *inactive* lanes for *dpp16* instructions. The default value is zero.
- Note: *inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
- GFX10 only.
- ======================================== ==================================================
- Syntax Description
- ======================================== ==================================================
- fi:0 Interaction with inactive lanes is controlled by
- :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`.
- fi:1 Fetch pre-exist values from inactive lanes.
- ======================================== ==================================================
- Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- SDWA Modifiers
- --------------
- GFX8, GFX9 and GFX10 only.
- clamp
- ~~~~~
- See a description :ref:`here<amdgpu_synid_clamp>`.
- omod
- ~~~~
- See a description :ref:`here<amdgpu_synid_omod>`.
- GFX9 and GFX10 only.
- .. _amdgpu_synid_dst_sel:
- dst_sel
- ~~~~~~~
- Selects which bits in the destination are affected. By default, all bits are affected.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- dst_sel:DWORD Use bits 31:0.
- dst_sel:BYTE_0 Use bits 7:0.
- dst_sel:BYTE_1 Use bits 15:8.
- dst_sel:BYTE_2 Use bits 23:16.
- dst_sel:BYTE_3 Use bits 31:24.
- dst_sel:WORD_0 Use bits 15:0.
- dst_sel:WORD_1 Use bits 31:16.
- ======================================== ================================================
- .. _amdgpu_synid_dst_unused:
- dst_unused
- ~~~~~~~~~~
- Controls what to do with the bits in the destination which are not selected
- by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
- By default, unused bits are preserved.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- dst_unused:UNUSED_PAD Pad with zeros.
- dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits.
- dst_unused:UNUSED_PRESERVE Preserve bits.
- ======================================== ================================================
- .. _amdgpu_synid_src0_sel:
- src0_sel
- ~~~~~~~~
- Controls which bits in the src0 are used. By default, all bits are used.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- src0_sel:DWORD Use bits 31:0.
- src0_sel:BYTE_0 Use bits 7:0.
- src0_sel:BYTE_1 Use bits 15:8.
- src0_sel:BYTE_2 Use bits 23:16.
- src0_sel:BYTE_3 Use bits 31:24.
- src0_sel:WORD_0 Use bits 15:0.
- src0_sel:WORD_1 Use bits 31:16.
- ======================================== ================================================
- .. _amdgpu_synid_src1_sel:
- src1_sel
- ~~~~~~~~
- Controls which bits in the src1 are used. By default, all bits are used.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- src1_sel:DWORD Use bits 31:0.
- src1_sel:BYTE_0 Use bits 7:0.
- src1_sel:BYTE_1 Use bits 15:8.
- src1_sel:BYTE_2 Use bits 23:16.
- src1_sel:BYTE_3 Use bits 31:24.
- src1_sel:WORD_0 Use bits 15:0.
- src1_sel:WORD_1 Use bits 31:16.
- ======================================== ================================================
- .. _amdgpu_synid_sdwa_operand_modifiers:
- SDWA Operand Modifiers
- ----------------------
- Operand modifiers are not used separately. They are applied to source operands.
- GFX8, GFX9 and GFX10 only.
- abs
- ~~~
- See a description :ref:`here<amdgpu_synid_abs>`.
- neg
- ~~~
- See a description :ref:`here<amdgpu_synid_neg>`.
- .. _amdgpu_synid_sext:
- sext
- ~~~~
- Sign-extends value of a (sub-dword) operand to fill all 32 bits.
- Has no effect for 32-bit operands.
- Valid for integer operands only.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- sext(<operand>) Sign-extend operand value.
- ======================================== ================================================
- Examples:
- .. parsed-literal::
- sext(v4)
- sext(v255)
- VOP3 Modifiers
- --------------
- .. _amdgpu_synid_vop3_op_sel:
- op_sel
- ~~~~~~
- Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
- By default, low bits are used for all operands.
- The number of values specified with the op_sel modifier must match the number of instruction
- operands (both source and destination). First value controls src0, second value controls src1
- and so on, except that the last value controls destination.
- The value 0 selects the low bits, while 1 selects the high bits.
- Note: op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
- by op_sel must be 0.
- GFX9 and GFX10 only.
- ======================================== ============================================================
- Syntax Description
- ======================================== ============================================================
- op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand.
- op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
- op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
- ======================================== ============================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- op_sel:[0,0]
- op_sel:[0,1]
- .. _amdgpu_synid_clamp:
- clamp
- ~~~~~
- Clamp meaning depends on instruction.
- For *v_cmp* instructions, clamp modifier indicates that the compare signals
- if a floating point exception occurs. By default, signaling is disabled.
- Not supported by GFX7.
- For integer operations, clamp modifier indicates that the result must be clamped
- to the largest and smallest representable value. By default, there is no clamping.
- Integer clamping is not supported by GFX7.
- For floating point operations, clamp modifier indicates that the result must be clamped
- to the range [0.0, 1.0]. By default, there is no clamping.
- Note: clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- clamp Enables clamping (or signaling).
- ======================================== ================================================
- .. _amdgpu_synid_omod:
- omod
- ~~~~
- Specifies if an output modifier must be applied to the result.
- By default, no output modifiers are applied.
- Note: output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
- Output modifiers are valid for f32 and f64 floating point results only.
- They must not be used with f16.
- Note: *v_cvt_f16_f32* is an exception. This instruction produces f16 result
- but accepts output modifiers.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- mul:2 Multiply the result by 2.
- mul:4 Multiply the result by 4.
- div:2 Multiply the result by 0.5.
- ======================================== ================================================
- Note: numeric values may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- mul:2
- mul:x // x must be equal to 2 or 4
- .. _amdgpu_synid_vop3_operand_modifiers:
- VOP3 Operand Modifiers
- ----------------------
- Operand modifiers are not used separately. They are applied to source operands.
- .. _amdgpu_synid_abs:
- abs
- ~~~
- Computes the absolute value of its operand. Must be applied before :ref:`neg<amdgpu_synid_neg>`
- (if any). Valid for floating point operands only.
- ======================================== ====================================================
- Syntax Description
- ======================================== ====================================================
- abs(<operand>) Get the absolute value of a floating-point operand.
- \|<operand>| The same as above (an SP3 syntax).
- ======================================== ====================================================
- Note: avoid using SP3 syntax with operands specified as expressions because the trailing '|'
- may be misinterpreted. Such operands should be enclosed into additional parentheses as shown
- in examples below.
- Examples:
- .. parsed-literal::
- abs(v36)
- \|v36|
- abs(x|y) // ok
- \|(x|y)| // additional parentheses are required
- .. _amdgpu_synid_neg:
- neg
- ~~~
- Computes the negative value of its operand. Must be applied after :ref:`abs<amdgpu_synid_abs>`
- (if any). Valid for floating point operands only.
- ================== ====================================================
- Syntax Description
- ================== ====================================================
- neg(<operand>) Get the negative value of a floating-point operand.
- The operand may include an optional
- :ref:`abs<amdgpu_synid_abs>` modifier.
- -<operand> The same as above (an SP3 syntax).
- ================== ====================================================
- Note: SP3 syntax is supported with limitations because of a potential ambiguity.
- Currently it is allowed in the following cases:
- * Before a register.
- * Before an :ref:`abs<amdgpu_synid_abs>` modifier.
- * Before an SP3 :ref:`abs<amdgpu_synid_abs>` modifier.
- In all other cases "-" is handled as a part of an expression that follows the sign.
- Examples:
- .. parsed-literal::
- // Operands with negate modifiers
- neg(v[0])
- neg(1.0)
- neg(abs(v0))
- -v5
- -abs(v5)
- -\|v5|
- // Operands without negate modifiers
- -1
- -x+y
- VOP3P Modifiers
- ---------------
- This section describes modifiers of *regular* VOP3P instructions.
- *v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16*
- instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
- GFX9 and GFX10 only.
- .. _amdgpu_synid_op_sel:
- op_sel
- ~~~~~~
- Selects the low [15:0] or high [31:16] operand bits as input to the operation
- which results in the lower-half of the destination.
- By default, low bits are used for all operands.
- The number of values specified by the *op_sel* modifier must match the number of source
- operands. First value controls src0, second value controls src1 and so on.
- The value 0 selects the low bits, while 1 selects the high bits.
- ================================= =============================================================
- Syntax Description
- ================================= =============================================================
- op_sel:[{0..1}] Select operand bits for instructions with 1 source operand.
- op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
- op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
- ================================= =============================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- op_sel:[0,0]
- op_sel:[0,1,0]
- .. _amdgpu_synid_op_sel_hi:
- op_sel_hi
- ~~~~~~~~~
- Selects the low [15:0] or high [31:16] operand bits as input to the operation
- which results in the upper-half of the destination.
- By default, high bits are used for all operands.
- The number of values specified by the *op_sel_hi* modifier must match the number of source
- operands. First value controls src0, second value controls src1 and so on.
- The value 0 selects the low bits, while 1 selects the high bits.
- =================================== =============================================================
- Syntax Description
- =================================== =============================================================
- op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand.
- op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
- op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
- =================================== =============================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- op_sel_hi:[0,0]
- op_sel_hi:[0,0,1]
- .. _amdgpu_synid_neg_lo:
- neg_lo
- ~~~~~~
- Specifies whether to change sign of operand values selected by
- :ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
- as input to the operation which results in the upper-half of the destination.
- The number of values specified by this modifier must match the number of source
- operands. First value controls src0, second value controls src1 and so on.
- The value 0 indicates that the corresponding operand value is used unmodified,
- the value 1 indicates that negative value of the operand must be used.
- By default, operand values are used unmodified.
- This modifier is valid for floating point operands only.
- ================================ ==================================================================
- Syntax Description
- ================================ ==================================================================
- neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand.
- neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
- neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
- ================================ ==================================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- neg_lo:[0]
- neg_lo:[0,1]
- .. _amdgpu_synid_neg_hi:
- neg_hi
- ~~~~~~
- Specifies whether to change sign of operand values selected by
- :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
- as input to the operation which results in the upper-half of the destination.
- The number of values specified by this modifier must match the number of source
- operands. First value controls src0, second value controls src1 and so on.
- The value 0 indicates that the corresponding operand value is used unmodified,
- the value 1 indicates that negative value of the operand must be used.
- By default, operand values are used unmodified.
- This modifier is valid for floating point operands only.
- =============================== ==================================================================
- Syntax Description
- =============================== ==================================================================
- neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand.
- neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
- neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
- =============================== ==================================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- neg_hi:[1,0]
- neg_hi:[0,1,1]
- clamp
- ~~~~~
- See a description :ref:`here<amdgpu_synid_clamp>`.
- .. _amdgpu_synid_mad_mix:
- VOP3P V_MAD_MIX Modifiers
- -------------------------
- *v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions
- use *op_sel* and *op_sel_hi* modifiers
- in a manner different from *regular* VOP3P instructions.
- See a description below.
- GFX9 and GFX10 only.
- .. _amdgpu_synid_mad_mix_op_sel:
- m_op_sel
- ~~~~~~~~
- This operand has meaning only for 16-bit source operands as indicated by
- :ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
- It specifies to select either the low [15:0] or high [31:16] operand bits
- as input to the operation.
- The number of values specified by the *op_sel* modifier must match the number of source
- operands. First value controls src0, second value controls src1 and so on.
- The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
- By default, low bits are used for all operands.
- =============================== ================================================
- Syntax Description
- =============================== ================================================
- op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand.
- =============================== ================================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- op_sel:[0,1]
- .. _amdgpu_synid_mad_mix_op_sel_hi:
- m_op_sel_hi
- ~~~~~~~~~~~
- Selects the size of source operands: either 32 bits or 16 bits.
- By default, 32 bits are used for all source operands.
- The number of values specified by the *op_sel_hi* modifier must match the number of source
- operands. First value controls src0, second value controls src1 and so on.
- The value 0 indicates 32 bits, the value 1 indicates 16 bits.
- The location of 16 bits in the operand may be specified by
- :ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`.
- ======================================== ====================================
- Syntax Description
- ======================================== ====================================
- op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand.
- ======================================== ====================================
- Note: numeric values may be specified as either
- :ref:`integer numbers<amdgpu_synid_integer_number>` or
- :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
- Examples:
- .. parsed-literal::
- op_sel_hi:[1,1,1]
- abs
- ~~~
- See a description :ref:`here<amdgpu_synid_abs>`.
- neg
- ~~~
- See a description :ref:`here<amdgpu_synid_neg>`.
- clamp
- ~~~~~
- See a description :ref:`here<amdgpu_synid_clamp>`.
|