atomic_design_c.html 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
  2. "http://www.w3.org/TR/html4/strict.dtd">
  3. <!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
  4. <html>
  5. <head>
  6. <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
  7. <title>&lt;atomic&gt; design</title>
  8. <link type="text/css" rel="stylesheet" href="menu.css">
  9. <link type="text/css" rel="stylesheet" href="content.css">
  10. </head>
  11. <body>
  12. <div id="menu">
  13. <div>
  14. <a href="https://llvm.org/">LLVM Home</a>
  15. </div>
  16. <div class="submenu">
  17. <label>libc++ Info</label>
  18. <a href="/index.html">About</a>
  19. </div>
  20. <div class="submenu">
  21. <label>Quick Links</label>
  22. <a href="https://lists.llvm.org/mailman/listinfo/cfe-dev">cfe-dev</a>
  23. <a href="https://lists.llvm.org/mailman/listinfo/cfe-commits">cfe-commits</a>
  24. <a href="https://bugs.llvm.org/">Bug Reports</a>
  25. <a href="https://github.com/llvm/llvm-project/tree/master/libcxx/">Browse Sources</a>
  26. </div>
  27. </div>
  28. <div id="content">
  29. <!--*********************************************************************-->
  30. <h1>&lt;atomic&gt; design</h1>
  31. <!--*********************************************************************-->
  32. <p>
  33. The <tt>&lt;atomic&gt;</tt> header is one of the most closely coupled headers to
  34. the compiler. Ideally when you invoke any function from
  35. <tt>&lt;atomic&gt;</tt>, it should result in highly optimized assembly being
  36. inserted directly into your application ... assembly that is not otherwise
  37. representable by higher level C or C++ expressions. The design of the libc++
  38. <tt>&lt;atomic&gt;</tt> header started with this goal in mind. A secondary, but
  39. still very important goal is that the compiler should have to do minimal work to
  40. facilitate the implementation of <tt>&lt;atomic&gt;</tt>. Without this second
  41. goal, then practically speaking, the libc++ <tt>&lt;atomic&gt;</tt> header would
  42. be doomed to be a barely supported, second class citizen on almost every
  43. platform.
  44. </p>
  45. <p>Goals:</p>
  46. <blockquote><ul>
  47. <li>Optimal code generation for atomic operations</li>
  48. <li>Minimal effort for the compiler to achieve goal 1 on any given platform</li>
  49. <li>Conformance to the C++0X draft standard</li>
  50. </ul></blockquote>
  51. <p>
  52. The purpose of this document is to inform compiler writers what they need to do
  53. to enable a high performance libc++ <tt>&lt;atomic&gt;</tt> with minimal effort.
  54. </p>
  55. <h2>The minimal work that must be done for a conforming <tt>&lt;atomic&gt;</tt></h2>
  56. <p>
  57. The only "atomic" operations that must actually be lock free in
  58. <tt>&lt;atomic&gt;</tt> are represented by the following compiler intrinsics:
  59. </p>
  60. <blockquote><pre>
  61. __atomic_flag__
  62. __atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
  63. {
  64. unique_lock&lt;mutex&gt; _(some_mutex);
  65. __atomic_flag__ result = *obj;
  66. *obj = desr;
  67. return result;
  68. }
  69. void
  70. __atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
  71. {
  72. unique_lock&lt;mutex&gt; _(some_mutex);
  73. *obj = desr;
  74. }
  75. </pre></blockquote>
  76. <p>
  77. Where:
  78. </p>
  79. <blockquote><ul>
  80. <li>
  81. If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then
  82. the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to
  83. <tt>int</tt>).
  84. </li>
  85. <li>
  86. If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then
  87. the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>.
  88. </li>
  89. <li>
  90. <p>
  91. To communicate that the above intrinsics are available, the compiler must
  92. arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name
  93. appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>.
  94. </p>
  95. <p>
  96. For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>:
  97. </p>
  98. <blockquote><pre>
  99. __has_feature(__atomic_flag) == 1
  100. __has_feature(__atomic_exchange_seq_cst_j) == 1
  101. __has_feature(__atomic_store_seq_cst_j) == 1
  102. typedef unsigned int __atomic_flag__;
  103. unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
  104. {
  105. // ...
  106. }
  107. void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
  108. {
  109. // ...
  110. }
  111. </pre></blockquote>
  112. </li>
  113. </ul></blockquote>
  114. <p>
  115. That's it! Compiler writers do the above and you've got a fully conforming
  116. (though sub-par performance) <tt>&lt;atomic&gt;</tt> header!
  117. </p>
  118. <h2>Recommended work for a higher performance <tt>&lt;atomic&gt;</tt></h2>
  119. <p>
  120. It would be good if the above intrinsics worked with all integral types plus
  121. <tt>void*</tt>. Because this may not be possible to do in a lock-free manner for
  122. all integral types on all platforms, a compiler must communicate each type that
  123. an intrinsic works with. For example if <tt>__atomic_exchange_seq_cst</tt> works
  124. for all types except for <tt>long long</tt> and <tt>unsigned long long</tt>
  125. then:
  126. </p>
  127. <blockquote><pre>
  128. __has_feature(__atomic_exchange_seq_cst_b) == 1 // bool
  129. __has_feature(__atomic_exchange_seq_cst_c) == 1 // char
  130. __has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char
  131. __has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char
  132. __has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
  133. __has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
  134. __has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t
  135. __has_feature(__atomic_exchange_seq_cst_s) == 1 // short
  136. __has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short
  137. __has_feature(__atomic_exchange_seq_cst_i) == 1 // int
  138. __has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int
  139. __has_feature(__atomic_exchange_seq_cst_l) == 1 // long
  140. __has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long
  141. __has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
  142. </pre></blockquote>
  143. <p>
  144. Note that only the <tt>__has_feature</tt> flag is decorated with the argument
  145. type. The name of the compiler intrinsic is not decorated, but instead works
  146. like a C++ overloaded function.
  147. </p>
  148. <p>
  149. Additionally there are other intrinsics besides
  150. <tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>. They
  151. are optional. But if the compiler can generate faster code than provided by the
  152. library, then clients will benefit from the compiler writer's expertise and
  153. knowledge of the targeted platform.
  154. </p>
  155. <p>
  156. Below is the complete list of <i>sequentially consistent</i> intrinsics, and
  157. their library implementations. Template syntax is used to indicate the desired
  158. overloading for integral and void* types. The template does not represent a
  159. requirement that the intrinsic operate on <em>any</em> type!
  160. </p>
  161. <blockquote><pre>
  162. T is one of: bool, char, signed char, unsigned char, short, unsigned short,
  163. int, unsigned int, long, unsigned long,
  164. long long, unsigned long long, char16_t, char32_t, wchar_t, void*
  165. template &lt;class T&gt;
  166. T
  167. __atomic_load_seq_cst(T const volatile* obj)
  168. {
  169. unique_lock&lt;mutex&gt; _(some_mutex);
  170. return *obj;
  171. }
  172. template &lt;class T&gt;
  173. void
  174. __atomic_store_seq_cst(T volatile* obj, T desr)
  175. {
  176. unique_lock&lt;mutex&gt; _(some_mutex);
  177. *obj = desr;
  178. }
  179. template &lt;class T&gt;
  180. T
  181. __atomic_exchange_seq_cst(T volatile* obj, T desr)
  182. {
  183. unique_lock&lt;mutex&gt; _(some_mutex);
  184. T r = *obj;
  185. *obj = desr;
  186. return r;
  187. }
  188. template &lt;class T&gt;
  189. bool
  190. __atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
  191. {
  192. unique_lock&lt;mutex&gt; _(some_mutex);
  193. if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
  194. {
  195. std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
  196. return true;
  197. }
  198. std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
  199. return false;
  200. }
  201. template &lt;class T&gt;
  202. bool
  203. __atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
  204. {
  205. unique_lock&lt;mutex&gt; _(some_mutex);
  206. if (std::memcmp(const_cast&lt;T*&gt;(obj), exp, sizeof(T)) == 0)
  207. {
  208. std::memcpy(const_cast&lt;T*&gt;(obj), &amp;desr, sizeof(T));
  209. return true;
  210. }
  211. std::memcpy(exp, const_cast&lt;T*&gt;(obj), sizeof(T));
  212. return false;
  213. }
  214. T is one of: char, signed char, unsigned char, short, unsigned short,
  215. int, unsigned int, long, unsigned long,
  216. long long, unsigned long long, char16_t, char32_t, wchar_t
  217. template &lt;class T&gt;
  218. T
  219. __atomic_fetch_add_seq_cst(T volatile* obj, T operand)
  220. {
  221. unique_lock&lt;mutex&gt; _(some_mutex);
  222. T r = *obj;
  223. *obj += operand;
  224. return r;
  225. }
  226. template &lt;class T&gt;
  227. T
  228. __atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
  229. {
  230. unique_lock&lt;mutex&gt; _(some_mutex);
  231. T r = *obj;
  232. *obj -= operand;
  233. return r;
  234. }
  235. template &lt;class T&gt;
  236. T
  237. __atomic_fetch_and_seq_cst(T volatile* obj, T operand)
  238. {
  239. unique_lock&lt;mutex&gt; _(some_mutex);
  240. T r = *obj;
  241. *obj &amp;= operand;
  242. return r;
  243. }
  244. template &lt;class T&gt;
  245. T
  246. __atomic_fetch_or_seq_cst(T volatile* obj, T operand)
  247. {
  248. unique_lock&lt;mutex&gt; _(some_mutex);
  249. T r = *obj;
  250. *obj |= operand;
  251. return r;
  252. }
  253. template &lt;class T&gt;
  254. T
  255. __atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
  256. {
  257. unique_lock&lt;mutex&gt; _(some_mutex);
  258. T r = *obj;
  259. *obj ^= operand;
  260. return r;
  261. }
  262. void*
  263. __atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
  264. {
  265. unique_lock&lt;mutex&gt; _(some_mutex);
  266. void* r = *obj;
  267. (char*&amp;)(*obj) += operand;
  268. return r;
  269. }
  270. void*
  271. __atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
  272. {
  273. unique_lock&lt;mutex&gt; _(some_mutex);
  274. void* r = *obj;
  275. (char*&amp;)(*obj) -= operand;
  276. return r;
  277. }
  278. void __atomic_thread_fence_seq_cst()
  279. {
  280. unique_lock&lt;mutex&gt; _(some_mutex);
  281. }
  282. void __atomic_signal_fence_seq_cst()
  283. {
  284. unique_lock&lt;mutex&gt; _(some_mutex);
  285. }
  286. </pre></blockquote>
  287. <p>
  288. One should consult the (currently draft)
  289. <a href="https://wg21.link/n3126">C++ standard</a>
  290. for the details of the definitions for these operations. For example
  291. <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail
  292. spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is
  293. not.
  294. </p>
  295. <p>
  296. If on your platform the lock-free definition of
  297. <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as
  298. <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the
  299. <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a
  300. performance cost. The library will prefer your implementation of
  301. <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own
  302. definition for implementing
  303. <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>. That is, the library
  304. will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call
  305. <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an
  306. intrinsic for the strong version but not the weak.
  307. </p>
  308. <h2>Taking advantage of weaker memory synchronization</h2>
  309. <p>
  310. So far all of the intrinsics presented require a <em>sequentially
  311. consistent</em> memory ordering. That is, no loads or stores can move across
  312. the operation (just as if the library had locked that internal mutex). But
  313. <tt>&lt;atomic&gt;</tt> supports weaker memory ordering operations. In all,
  314. there are six memory orderings (listed here from strongest to weakest):
  315. </p>
  316. <blockquote><pre>
  317. memory_order_seq_cst
  318. memory_order_acq_rel
  319. memory_order_release
  320. memory_order_acquire
  321. memory_order_consume
  322. memory_order_relaxed
  323. </pre></blockquote>
  324. <p>
  325. (See the
  326. <a href="https://wg21.link/n3126">C++ standard</a>
  327. for the detailed definitions of each of these orderings).
  328. </p>
  329. <p>
  330. On some platforms, the compiler vendor can offer some or even all of the above
  331. intrinsics at one or more weaker levels of memory synchronization. This might
  332. lead for example to not issuing an <tt>mfence</tt> instruction on the x86.
  333. </p>
  334. <p>
  335. If the compiler does not offer any given operation, at any given memory ordering
  336. level, the library will automatically attempt to call the next highest memory
  337. ordering operation. This continues up to <tt>seq_cst</tt>, and if that doesn't
  338. exist, then the library takes over and does the job with a <tt>mutex</tt>. This
  339. is a compile-time search &amp; selection operation. At run time, the
  340. application will only see the few inlined assembly instructions for the selected
  341. intrinsic.
  342. </p>
  343. <p>
  344. Each intrinsic is appended with the 7-letter name of the memory ordering it
  345. addresses. For example a <tt>load</tt> with <tt>relaxed</tt> ordering is
  346. defined by:
  347. </p>
  348. <blockquote><pre>
  349. T __atomic_load_relaxed(const volatile T* obj);
  350. </pre></blockquote>
  351. <p>
  352. And announced with:
  353. </p>
  354. <blockquote><pre>
  355. __has_feature(__atomic_load_relaxed_b) == 1 // bool
  356. __has_feature(__atomic_load_relaxed_c) == 1 // char
  357. __has_feature(__atomic_load_relaxed_a) == 1 // signed char
  358. ...
  359. </pre></blockquote>
  360. <p>
  361. The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized
  362. on two memory orderings. The first ordering applies when the operation returns
  363. <tt>true</tt> and the second ordering applies when the operation returns
  364. <tt>false</tt>.
  365. </p>
  366. <p>
  367. Not every memory ordering is appropriate for every operation. <tt>exchange</tt>
  368. and the <tt>fetch_<i>op</i></tt> operations support all 6. But <tt>load</tt>
  369. only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>.
  370. <tt>store</tt>
  371. only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>. The
  372. <tt>compare_exchange</tt> operations support the following 16 combinations out
  373. of the possible 36:
  374. </p>
  375. <blockquote><pre>
  376. relaxed_relaxed
  377. consume_relaxed
  378. consume_consume
  379. acquire_relaxed
  380. acquire_consume
  381. acquire_acquire
  382. release_relaxed
  383. release_consume
  384. release_acquire
  385. acq_rel_relaxed
  386. acq_rel_consume
  387. acq_rel_acquire
  388. seq_cst_relaxed
  389. seq_cst_consume
  390. seq_cst_acquire
  391. seq_cst_seq_cst
  392. </pre></blockquote>
  393. <p>
  394. Again, the compiler supplies intrinsics only for the strongest orderings where
  395. it can make a difference. The library takes care of calling the weakest
  396. supplied intrinsic that is as strong or stronger than the customer asked for.
  397. </p>
  398. </div>
  399. </body>
  400. </html>