OpenMPSupport.rst 27 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259
  1. .. raw:: html
  2. <style type="text/css">
  3. .none { background-color: #FFCCCC }
  4. .part { background-color: #FFFF99 }
  5. .good { background-color: #CCFF99 }
  6. </style>
  7. .. role:: none
  8. .. role:: part
  9. .. role:: good
  10. .. contents::
  11. :local:
  12. ==================
  13. OpenMP Support
  14. ==================
  15. Clang supports the following OpenMP 5.0 features (see also `OpenMP implementation details`_):
  16. * The `reduction`-based clauses in the `task` and `target`-based directives.
  17. * Support relational-op != (not-equal) as one of the canonical forms of random
  18. access iterator.
  19. * Support for mapping of the lambdas in target regions.
  20. * Parsing/sema analysis for the requires directive.
  21. * Nested declare target directives.
  22. * Make the `this` pointer implicitly mapped as `map(this[:1])`.
  23. * The `close` *map-type-modifier*.
  24. Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
  25. PPC64[LE] and has `basic support for Cuda devices`_.
  26. * #pragma omp declare simd: :part:`Partial`. We support parsing/semantic
  27. analysis + generation of special attributes for X86 target, but still
  28. missing the LLVM pass for vectorization.
  29. In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
  30. Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
  31. General improvements
  32. --------------------
  33. - New collapse clause scheme to avoid expensive remainder operations.
  34. Compute loop index variables after collapsing a loop nest via the
  35. collapse clause by replacing the expensive remainder operation with
  36. multiplications and additions.
  37. - The default schedules for the `distribute` and `for` constructs in a
  38. parallel region and in SPMD mode have changed to ensure coalesced
  39. accesses. For the `distribute` construct, a static schedule is used
  40. with a chunk size equal to the number of threads per team (default
  41. value of threads or as specified by the `thread_limit` clause if
  42. present). For the `for` construct, the schedule is static with chunk
  43. size of one.
  44. - Simplified SPMD code generation for `distribute parallel for` when
  45. the new default schedules are applicable.
  46. .. _basic support for Cuda devices:
  47. Cuda devices support
  48. ====================
  49. Directives execution modes
  50. --------------------------
  51. Clang code generation for target regions supports two modes: the SPMD and
  52. non-SPMD modes. Clang chooses one of these two modes automatically based on the
  53. way directives and clauses on those directives are used. The SPMD mode uses a
  54. simplified set of runtime functions thus increasing performance at the cost of
  55. supporting some OpenMP features. The non-SPMD mode is the most generic mode and
  56. supports all currently available OpenMP features. The compiler will always
  57. attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:
  58. - The target region contains an `if()` clause that refers to a `parallel`
  59. directive.
  60. - The target region contains a `parallel` directive with a `num_threads()`
  61. clause.
  62. - The target region contains user code (other than OpenMP-specific
  63. directives) in between the `target` and the `parallel` directives.
  64. Data-sharing modes
  65. ------------------
  66. Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
  67. modes. The default mode is `Generic`. `Cuda` mode can give an additional
  68. performance and can be activated using the `-fopenmp-cuda-mode` flag. In
  69. `Generic` mode all local variables that can be shared in the parallel regions
  70. are stored in the global memory. In `Cuda` mode local variables are not shared
  71. between the threads and it is user responsibility to share the required data
  72. between the threads in the parallel regions.
  73. Collapsed loop nest counter
  74. ---------------------------
  75. When using the collapse clause on a loop nest the default behavior is to
  76. automatically extend the representation of the loop counter to 64 bits for
  77. the cases where the sizes of the collapsed loops are not known at compile
  78. time. To prevent this conservative choice and use at most 32 bits,
  79. compile your program with the `-fopenmp-optimistic-collapse`.
  80. Features not supported or with limited support for Cuda devices
  81. ---------------------------------------------------------------
  82. - Cancellation constructs are not supported.
  83. - Doacross loop nest is not supported.
  84. - User-defined reductions are supported only for trivial types.
  85. - Nested parallelism: inner parallel regions are executed sequentially.
  86. - Static linking of libraries containing device code is not supported yet.
  87. - Automatic translation of math functions in target regions to device-specific
  88. math functions is not implemented yet.
  89. - Debug information for OpenMP target regions is supported, but sometimes it may
  90. be required to manually specify the address class of the inspected variables.
  91. In some cases the local variables are actually allocated in the global memory,
  92. but the debug info may be not aware of it.
  93. .. _OpenMP implementation details:
  94. OpenMP 5.0 Implementation Details
  95. ---------------------------------
  96. The following table provides a quick overview over various OpenMP 5.0 features
  97. and their implementation status. Please contact *openmp-dev* at
  98. *lists.llvm.org* for more information or if you want to help with the
  99. implementation.
  100. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  101. |Category | Feature | Status | Reviews |
  102. +==============================+==============================================================+==========================+=======================================================================+
  103. | loop extension | support != in the canonical loop form | :good:`done` | D54441 |
  104. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  105. | loop extension | #pragma omp loop (directive) | :none:`unclaimed` | |
  106. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  107. | loop extension | collapse imperfectly nested loop | :none:`unclaimed` | |
  108. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  109. | loop extension | collapse non-rectangular nested loop | :good:`done` | |
  110. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  111. | loop extension | C++ range-base for loop | :none:`unclaimed` | |
  112. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  113. | loop extension | clause: nosimd for SIMD directives | :none:`unclaimed` | |
  114. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  115. | loop extension | inclusive scan extension (matching C++17 PSTL) | :none:`unclaimed` | |
  116. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  117. | memory mangagement | memory allocators | :good:`done` | r341687,r357929 |
  118. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  119. | memory mangagement | allocate directive and allocate clause | :good:`done` | r355614,r335952 |
  120. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  121. | OMPD | OMPD interfaces | :part:`not upstream` | https://github.com/OpenMPToolsInterface/LLVM-openmp/tree/ompd-tests |
  122. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  123. | OMPT | OMPT interfaces | :part:`mostly done` | |
  124. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  125. | thread affinity extension | thread affinity extension | :good:`done` | |
  126. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  127. | task extension | taskloop reduction | :good:`done` | |
  128. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  129. | task extension | task affinity | :part:`not upstream` | |
  130. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  131. | task extension | clause: depend on the taskwait construct | :part:`worked on` | |
  132. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  133. | task extension | depend objects and detachable tasks | :part:`worked on` | |
  134. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  135. | task extension | mutexinoutset dependence-type for tasks | :good:`done` | D53380,D57576 |
  136. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  137. | task extension | combined taskloop constructs | :none:`unclaimed` | |
  138. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  139. | task extension | master taskloop | :good:`done` | |
  140. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  141. | task extension | parallel master taskloop | :none:`done` | |
  142. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  143. | task extension | master taskloop simd | :none:`done` | |
  144. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  145. | task extension | parallel master taskloop simd | :none:`unclaimed` | |
  146. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  147. | SIMD extension | atomic and critical constructs inside SIMD code | :none:`unclaimed` | |
  148. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  149. | SIMD extension | SIMD nontemporal | :none:`unclaimed` | |
  150. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  151. | device extension | infer target functions from initializers | :part:`worked on` | |
  152. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  153. | device extension | infer target variables from initializers | :part:`worked on` | |
  154. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  155. | device extension | OMP_TARGET_OFFLOAD environment variable | :good:`done` | D50522 |
  156. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  157. | device extension | support full 'defaultmap' functionality | :part:`worked on` | |
  158. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  159. | device extension | device specific functions | :none:`unclaimed` | |
  160. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  161. | device extension | clause: device_type | :good:`done` | |
  162. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  163. | device extension | clause: in_reduction | :none:`unclaimed` | r308768 |
  164. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  165. | device extension | omp_get_device_num() | :part:`worked on` | D54342 |
  166. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  167. | device extension | structure mapping of references | :none:`unclaimed` | |
  168. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  169. | device extension | nested target declare | :good:`done` | D51378 |
  170. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  171. | device extension | implicitly map 'this' (this[:1]) | :good:`done` | D55982 |
  172. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  173. | device extension | allow access to the reference count (omp_target_is_present) | :part:`worked on` | |
  174. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  175. | device extension | requires directive (unified shared memory) | :part:`worked on` | |
  176. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  177. | device extension | clause: unified_address, unified_shared_memory | :part:`worked on` | D52625,D52359 |
  178. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  179. | device extension | clause: reverse_offload | :none:`unclaimed parts` | D52780 |
  180. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  181. | device extension | clause: atomic_default_mem_order | :none:`unclaimed parts` | D53513 |
  182. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  183. | device extension | clause: dynamic_allocators | :none:`unclaimed parts` | D53079 |
  184. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  185. | device extension | user-defined mappers | :part:`worked on` | D56326,D58638,D58523,D58074,D60972,D59474 |
  186. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  187. | device extension | mapping lambda expression | :good:`done` | D51107 |
  188. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  189. | device extension | clause: use_device_addr for target data | :good:`done` | |
  190. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  191. | device extension | map(replicate) or map(local) when requires unified_shared_me | :part:`worked on` | D55719,D55892 |
  192. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  193. | device extension | teams construct on the host device | :part:`worked on` | Clang part is done, r371553. |
  194. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  195. | atomic extension | hints for the atomic construct | :part:`worked on` | D51233 |
  196. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  197. | base language | C11 support | :none:`unclaimed` | |
  198. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  199. | base language | C++11/14/17 support | :none:`unclaimed` | |
  200. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  201. | base language | lambda support | :good:`done` | |
  202. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  203. | misc extension | array shaping | :none:`unclaimed` | |
  204. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  205. | misc extension | library shutdown (omp_pause_resource[_all]) | :none:`unclaimed parts` | D55078 |
  206. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  207. | misc extension | metadirectives | :none:`unclaimed` | |
  208. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  209. | misc extension | conditional modifier for lastprivate clause | :none:`unclaimed` | |
  210. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  211. | misc extension | user-defined function variants | :part:`worked on` | D67294, D64095 |
  212. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  213. | misc extensions | pointer/reference to pointer based array reductions | :none:`unclaimed` | |
  214. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
  215. | misc extensions | prevent new type definitions in clauses | :none:`unclaimed` | |
  216. +------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+