decodetree.rst 9.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262
  1. .. _decodetree:
  2. ========================
  3. Decodetree Specification
  4. ========================
  5. A *decodetree* is built from instruction *patterns*. A pattern may
  6. represent a single architectural instruction or a group of same, depending
  7. on what is convenient for further processing.
  8. Each pattern has both *fixedbits* and *fixedmask*, the combination of which
  9. describes the condition under which the pattern is matched::
  10. (insn & fixedmask) == fixedbits
  11. Each pattern may have *fields*, which are extracted from the insn and
  12. passed along to the translator. Examples of such are registers,
  13. immediates, and sub-opcodes.
  14. In support of patterns, one may declare *fields*, *argument sets*, and
  15. *formats*, each of which may be re-used to simplify further definitions.
  16. Fields
  17. ======
  18. Syntax::
  19. field_def := '%' identifier ( field )* ( !function=identifier )?
  20. field := unnamed_field | named_field
  21. unnamed_field := number ':' ( 's' ) number
  22. named_field := identifier ':' ( 's' ) number
  23. For *unnamed_field*, the first number is the least-significant bit position
  24. of the field and the second number is the length of the field. If the 's' is
  25. present, the field is considered signed.
  26. A *named_field* refers to some other field in the instruction pattern
  27. or format. Regardless of the length of the other field where it is
  28. defined, it will be inserted into this field with the specified
  29. signedness and bit width.
  30. Field definitions that involve loops (i.e. where a field is defined
  31. directly or indirectly in terms of itself) are errors.
  32. A format can include fields that refer to named fields that are
  33. defined in the instruction pattern(s) that use the format.
  34. Conversely, an instruction pattern can include fields that refer to
  35. named fields that are defined in the format it uses. However you
  36. cannot currently do both at once (i.e. pattern P uses format F; F has
  37. a field A that refers to a named field B that is defined in P, and P
  38. has a field C that refers to a named field D that is defined in F).
  39. If multiple ``fields`` are present, they are concatenated.
  40. In this way one can define disjoint fields.
  41. If ``!function`` is specified, the concatenated result is passed through the
  42. named function, taking and returning an integral value.
  43. One may use ``!function`` with zero ``fields``. This case is called
  44. a *parameter*, and the named function is only passed the ``DisasContext``
  45. and returns an integral value extracted from there.
  46. A field with no ``fields`` and no ``!function`` is in error.
  47. Field examples:
  48. +---------------------------+---------------------------------------------+
  49. | Input | Generated code |
  50. +===========================+=============================================+
  51. | %disp 0:s16 | sextract(i, 0, 16) |
  52. +---------------------------+---------------------------------------------+
  53. | %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) |
  54. +---------------------------+---------------------------------------------+
  55. | %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | |
  56. | | extract(i, 1, 1) << 10 | |
  57. | | extract(i, 2, 10) |
  58. +---------------------------+---------------------------------------------+
  59. | %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | |
  60. | !function=expand_shimm8 | extract(i, 13, 1)) |
  61. +---------------------------+---------------------------------------------+
  62. | %sz_imm 10:2 sz:3 | expand_sz_imm(extract(i, 10, 2) << 3 | |
  63. | !function=expand_sz_imm | extract(a->sz, 0, 3)) |
  64. +---------------------------+---------------------------------------------+
  65. Argument Sets
  66. =============
  67. Syntax::
  68. args_def := '&' identifier ( args_elt )+ ( !extern )?
  69. args_elt := identifier (':' identifier)?
  70. Each *args_elt* defines an argument within the argument set.
  71. If the form of the *args_elt* contains a colon, the first
  72. identifier is the argument name and the second identifier is
  73. the argument type. If the colon is missing, the argument
  74. type will be ``int``.
  75. Each argument set will be rendered as a C structure "arg_$name"
  76. with each of the fields being one of the member arguments.
  77. If ``!extern`` is specified, the backing structure is assumed
  78. to have been already declared, typically via a second decoder.
  79. Argument sets are useful when one wants to define helper functions
  80. for the translator functions that can perform operations on a common
  81. set of arguments. This can ensure, for instance, that the ``AND``
  82. pattern and the ``OR`` pattern put their operands into the same named
  83. structure, so that a common ``gen_logic_insn`` may be able to handle
  84. the operations common between the two.
  85. Argument set examples::
  86. &reg3 ra rb rc
  87. &loadstore reg base offset
  88. &longldst reg base offset:int64_t
  89. Formats
  90. =======
  91. Syntax::
  92. fmt_def := '@' identifier ( fmt_elt )+
  93. fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref
  94. fixedbit_elt := [01.-]+
  95. field_elt := identifier ':' 's'? number
  96. field_ref := '%' identifier | identifier '=' '%' identifier
  97. args_ref := '&' identifier
  98. Defining a format is a handy way to avoid replicating groups of fields
  99. across many instruction patterns.
  100. A *fixedbit_elt* describes a contiguous sequence of bits that must
  101. be 1, 0, or don't care. The difference between '.' and '-'
  102. is that '.' means that the bit will be covered with a field or a
  103. final 0 or 1 from the pattern, and '-' means that the bit is really
  104. ignored by the cpu and will not be specified.
  105. A *field_elt* describes a simple field only given a width; the position of
  106. the field is implied by its position with respect to other *fixedbit_elt*
  107. and *field_elt*.
  108. If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
  109. Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
  110. A *field_ref* incorporates a field by reference. This is the only way to
  111. add a complex field to a format. A field may be renamed in the process
  112. via assignment to another identifier. This is intended to allow the
  113. same argument set be used with disjoint named fields.
  114. A single *args_ref* may specify an argument set to use for the format.
  115. The set of fields in the format must be a subset of the arguments in
  116. the argument set. If an argument set is not specified, one will be
  117. inferred from the set of fields.
  118. It is recommended, but not required, that all *field_ref* and *args_ref*
  119. appear at the end of the line, not interleaving with *fixedbit_elf* or
  120. *field_elt*.
  121. Format examples::
  122. @opr ...... ra:5 rb:5 ... 0 ....... rc:5
  123. @opi ...... ra:5 lit:8 1 ....... rc:5
  124. Patterns
  125. ========
  126. Syntax::
  127. pat_def := identifier ( pat_elt )+
  128. pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
  129. fmt_ref := '@' identifier
  130. const_elt := identifier '=' number
  131. The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
  132. A pattern that does not specify a named format will have one inferred
  133. from a referenced argument set (if present) and the set of fields.
  134. A *const_elt* allows a argument to be set to a constant value. This may
  135. come in handy when fields overlap between patterns and one has to
  136. include the values in the *fixedbit_elt* instead.
  137. The decoder will call a translator function for each pattern matched.
  138. Pattern examples::
  139. addl_r 010000 ..... ..... .... 0000000 ..... @opr
  140. addl_i 010000 ..... ..... .... 0000000 ..... @opi
  141. which will, in part, invoke::
  142. trans_addl_r(ctx, &arg_opr, insn)
  143. and::
  144. trans_addl_i(ctx, &arg_opi, insn)
  145. Pattern Groups
  146. ==============
  147. Syntax::
  148. group := overlap_group | no_overlap_group
  149. overlap_group := '{' ( pat_def | group )+ '}'
  150. no_overlap_group := '[' ( pat_def | group )+ ']'
  151. A *group* begins with a lone open-brace or open-bracket, with all
  152. subsequent lines indented two spaces, and ending with a lone
  153. close-brace or close-bracket. Groups may be nested, increasing the
  154. required indentation of the lines within the nested group to two
  155. spaces per nesting level.
  156. Patterns within overlap groups are allowed to overlap. Conflicts are
  157. resolved by selecting the patterns in order. If all of the fixedbits
  158. for a pattern match, its translate function will be called. If the
  159. translate function returns false, then subsequent patterns within the
  160. group will be matched.
  161. Patterns within no-overlap groups are not allowed to overlap, just
  162. the same as ungrouped patterns. Thus no-overlap groups are intended
  163. to be nested inside overlap groups.
  164. The following example from PA-RISC shows specialization of the *or*
  165. instruction::
  166. {
  167. {
  168. nop 000010 ----- ----- 0000 001001 0 00000
  169. copy 000010 00000 r1:5 0000 001001 0 rt:5
  170. }
  171. or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5
  172. }
  173. When the *cf* field is zero, the instruction has no side effects,
  174. and may be specialized. When the *rt* field is zero, the output
  175. is discarded and so the instruction has no effect. When the *rt2*
  176. field is zero, the operation is ``reg[r1] | 0`` and so encodes
  177. the canonical register copy operation.
  178. The output from the generator might look like::
  179. switch (insn & 0xfc000fe0) {
  180. case 0x08000240:
  181. /* 000010.. ........ ....0010 010..... */
  182. if ((insn & 0x0000f000) == 0x00000000) {
  183. /* 000010.. ........ 00000010 010..... */
  184. if ((insn & 0x0000001f) == 0x00000000) {
  185. /* 000010.. ........ 00000010 01000000 */
  186. extract_decode_Fmt_0(&u.f_decode0, insn);
  187. if (trans_nop(ctx, &u.f_decode0)) return true;
  188. }
  189. if ((insn & 0x03e00000) == 0x00000000) {
  190. /* 00001000 000..... 00000010 010..... */
  191. extract_decode_Fmt_1(&u.f_decode1, insn);
  192. if (trans_copy(ctx, &u.f_decode1)) return true;
  193. }
  194. }
  195. extract_decode_Fmt_2(&u.f_decode2, insn);
  196. if (trans_or(ctx, &u.f_decode2)) return true;
  197. return false;
  198. }