ProgrammersManual.rst 162 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626162716281629163016311632163316341635163616371638163916401641164216431644164516461647164816491650165116521653165416551656165716581659166016611662166316641665166616671668166916701671167216731674167516761677167816791680168116821683168416851686168716881689169016911692169316941695169616971698169917001701170217031704170517061707170817091710171117121713171417151716171717181719172017211722172317241725172617271728172917301731173217331734173517361737173817391740174117421743174417451746174717481749175017511752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959196019611962196319641965196619671968196919701971197219731974197519761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320242025202620272028202920302031203220332034203520362037203820392040204120422043204420452046204720482049205020512052205320542055205620572058205920602061206220632064206520662067206820692070207120722073207420752076207720782079208020812082208320842085208620872088208920902091209220932094209520962097209820992100210121022103210421052106210721082109211021112112211321142115211621172118211921202121212221232124212521262127212821292130213121322133213421352136213721382139214021412142214321442145214621472148214921502151215221532154215521562157215821592160216121622163216421652166216721682169217021712172217321742175217621772178217921802181218221832184218521862187218821892190219121922193219421952196219721982199220022012202220322042205220622072208220922102211221222132214221522162217221822192220222122222223222422252226222722282229223022312232223322342235223622372238223922402241224222432244224522462247224822492250225122522253225422552256225722582259226022612262226322642265226622672268226922702271227222732274227522762277227822792280228122822283228422852286228722882289229022912292229322942295229622972298229923002301230223032304230523062307230823092310231123122313231423152316231723182319232023212322232323242325232623272328232923302331233223332334233523362337233823392340234123422343234423452346234723482349235023512352235323542355235623572358235923602361236223632364236523662367236823692370237123722373237423752376237723782379238023812382238323842385238623872388238923902391239223932394239523962397239823992400240124022403240424052406240724082409241024112412241324142415241624172418241924202421242224232424242524262427242824292430243124322433243424352436243724382439244024412442244324442445244624472448244924502451245224532454245524562457245824592460246124622463246424652466246724682469247024712472247324742475247624772478247924802481248224832484248524862487248824892490249124922493249424952496249724982499250025012502250325042505250625072508250925102511251225132514251525162517251825192520252125222523252425252526252725282529253025312532253325342535253625372538253925402541254225432544254525462547254825492550255125522553255425552556255725582559256025612562256325642565256625672568256925702571257225732574257525762577257825792580258125822583258425852586258725882589259025912592259325942595259625972598259926002601260226032604260526062607260826092610261126122613261426152616261726182619262026212622262326242625262626272628262926302631263226332634263526362637263826392640264126422643264426452646264726482649265026512652265326542655265626572658265926602661266226632664266526662667266826692670267126722673267426752676267726782679268026812682268326842685268626872688268926902691269226932694269526962697269826992700270127022703270427052706270727082709271027112712271327142715271627172718271927202721272227232724272527262727272827292730273127322733273427352736273727382739274027412742274327442745274627472748274927502751275227532754275527562757275827592760276127622763276427652766276727682769277027712772277327742775277627772778277927802781278227832784278527862787278827892790279127922793279427952796279727982799280028012802280328042805280628072808280928102811281228132814281528162817281828192820282128222823282428252826282728282829283028312832283328342835283628372838283928402841284228432844284528462847284828492850285128522853285428552856285728582859286028612862286328642865286628672868286928702871287228732874287528762877287828792880288128822883288428852886288728882889289028912892289328942895289628972898289929002901290229032904290529062907290829092910291129122913291429152916291729182919292029212922292329242925292629272928292929302931293229332934293529362937293829392940294129422943294429452946294729482949295029512952295329542955295629572958295929602961296229632964296529662967296829692970297129722973297429752976297729782979298029812982298329842985298629872988298929902991299229932994299529962997299829993000300130023003300430053006300730083009301030113012301330143015301630173018301930203021302230233024302530263027302830293030303130323033303430353036303730383039304030413042304330443045304630473048304930503051305230533054305530563057305830593060306130623063306430653066306730683069307030713072307330743075307630773078307930803081308230833084308530863087308830893090309130923093309430953096309730983099310031013102310331043105310631073108310931103111311231133114311531163117311831193120312131223123312431253126312731283129313031313132313331343135313631373138313931403141314231433144314531463147314831493150315131523153315431553156315731583159316031613162316331643165316631673168316931703171317231733174317531763177317831793180318131823183318431853186318731883189319031913192319331943195319631973198319932003201320232033204320532063207320832093210321132123213321432153216321732183219322032213222322332243225322632273228322932303231323232333234323532363237323832393240324132423243324432453246324732483249325032513252325332543255325632573258325932603261326232633264326532663267326832693270327132723273327432753276327732783279328032813282328332843285328632873288328932903291329232933294329532963297329832993300330133023303330433053306330733083309331033113312331333143315331633173318331933203321332233233324332533263327332833293330333133323333333433353336333733383339334033413342334333443345334633473348334933503351335233533354335533563357335833593360336133623363336433653366336733683369337033713372337333743375337633773378337933803381338233833384338533863387338833893390339133923393339433953396339733983399340034013402340334043405340634073408340934103411341234133414341534163417341834193420342134223423342434253426342734283429343034313432343334343435343634373438343934403441344234433444344534463447344834493450345134523453345434553456345734583459346034613462346334643465346634673468346934703471347234733474347534763477347834793480348134823483348434853486348734883489349034913492349334943495349634973498349935003501350235033504350535063507350835093510351135123513351435153516351735183519352035213522352335243525352635273528352935303531353235333534353535363537353835393540354135423543354435453546354735483549355035513552355335543555355635573558355935603561356235633564356535663567356835693570357135723573357435753576357735783579358035813582358335843585358635873588358935903591359235933594359535963597359835993600360136023603360436053606360736083609361036113612361336143615361636173618361936203621362236233624362536263627362836293630363136323633363436353636363736383639364036413642364336443645364636473648364936503651365236533654365536563657365836593660366136623663366436653666366736683669367036713672367336743675367636773678367936803681368236833684368536863687368836893690369136923693369436953696369736983699370037013702370337043705370637073708370937103711371237133714371537163717371837193720372137223723372437253726372737283729373037313732373337343735373637373738373937403741374237433744374537463747374837493750375137523753375437553756375737583759376037613762376337643765376637673768376937703771377237733774377537763777377837793780378137823783378437853786378737883789379037913792379337943795379637973798379938003801380238033804380538063807380838093810381138123813381438153816381738183819382038213822382338243825382638273828382938303831383238333834383538363837383838393840384138423843384438453846384738483849385038513852385338543855385638573858385938603861386238633864386538663867386838693870387138723873387438753876387738783879388038813882388338843885388638873888388938903891389238933894389538963897389838993900390139023903390439053906390739083909391039113912391339143915391639173918391939203921392239233924392539263927392839293930393139323933393439353936393739383939394039413942394339443945394639473948394939503951395239533954395539563957395839593960396139623963396439653966396739683969397039713972397339743975397639773978397939803981398239833984398539863987398839893990399139923993399439953996399739983999400040014002400340044005400640074008400940104011401240134014401540164017401840194020402140224023402440254026402740284029403040314032403340344035403640374038403940404041404240434044404540464047404840494050405140524053405440554056405740584059406040614062406340644065406640674068406940704071407240734074407540764077407840794080408140824083408440854086408740884089409040914092409340944095409640974098409941004101410241034104410541064107410841094110411141124113411441154116411741184119412041214122412341244125412641274128412941304131413241334134413541364137413841394140414141424143414441454146414741484149415041514152415341544155415641574158415941604161416241634164416541664167416841694170417141724173417441754176417741784179
  1. ========================
  2. LLVM Programmer's Manual
  3. ========================
  4. .. contents::
  5. :local:
  6. .. warning::
  7. This is always a work in progress.
  8. .. _introduction:
  9. Introduction
  10. ============
  11. This document is meant to highlight some of the important classes and interfaces
  12. available in the LLVM source-base. This manual is not intended to explain what
  13. LLVM is, how it works, and what LLVM code looks like. It assumes that you know
  14. the basics of LLVM and are interested in writing transformations or otherwise
  15. analyzing or manipulating the code.
  16. This document should get you oriented so that you can find your way in the
  17. continuously growing source code that makes up the LLVM infrastructure. Note
  18. that this manual is not intended to serve as a replacement for reading the
  19. source code, so if you think there should be a method in one of these classes to
  20. do something, but it's not listed, check the source. Links to the `doxygen
  21. <http://llvm.org/doxygen/>`__ sources are provided to make this as easy as
  22. possible.
  23. The first section of this document describes general information that is useful
  24. to know when working in the LLVM infrastructure, and the second describes the
  25. Core LLVM classes. In the future this manual will be extended with information
  26. describing how to use extension libraries, such as dominator information, CFG
  27. traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen
  28. <http://llvm.org/doxygen/InstVisitor_8h_source.html>`__) template.
  29. .. _general:
  30. General Information
  31. ===================
  32. This section contains general information that is useful if you are working in
  33. the LLVM source-base, but that isn't specific to any particular API.
  34. .. _stl:
  35. The C++ Standard Template Library
  36. ---------------------------------
  37. LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much
  38. more than you are used to, or have seen before. Because of this, you might want
  39. to do a little background reading in the techniques used and capabilities of the
  40. library. There are many good pages that discuss the STL, and several books on
  41. the subject that you can get, so it will not be discussed in this document.
  42. Here are some useful links:
  43. #. `cppreference.com
  44. <http://en.cppreference.com/w/>`_ - an excellent
  45. reference for the STL and other parts of the standard C++ library.
  46. #. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly
  47. book in the making. It has a decent Standard Library Reference that rivals
  48. Dinkumware's, and is unfortunately no longer free since the book has been
  49. published.
  50. #. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_.
  51. #. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a
  52. useful `Introduction to the STL
  53. <http://www.sgi.com/tech/stl/stl_introduction.html>`_.
  54. #. `Bjarne Stroustrup's C++ Page
  55. <http://www.research.att.com/%7Ebs/C++.html>`_.
  56. #. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0
  57. (even better, get the book)
  58. <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_.
  59. You are also encouraged to take a look at the :doc:`LLVM Coding Standards
  60. <CodingStandards>` guide which focuses on how to write maintainable code more
  61. than where to put your curly braces.
  62. .. _resources:
  63. Other useful references
  64. -----------------------
  65. #. `Using static and shared libraries across platforms
  66. <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_
  67. .. _apis:
  68. Important and useful LLVM APIs
  69. ==============================
  70. Here we highlight some LLVM APIs that are generally useful and good to know
  71. about when writing transformations.
  72. .. _isa:
  73. The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates
  74. ------------------------------------------------------
  75. The LLVM source-base makes extensive use of a custom form of RTTI. These
  76. templates have many similarities to the C++ ``dynamic_cast<>`` operator, but
  77. they don't have some drawbacks (primarily stemming from the fact that
  78. ``dynamic_cast<>`` only works on classes that have a v-table). Because they are
  79. used so often, you must know what they do and how they work. All of these
  80. templates are defined in the ``llvm/Support/Casting.h`` (`doxygen
  81. <http://llvm.org/doxygen/Casting_8h_source.html>`__) file (note that you very
  82. rarely have to include this file directly).
  83. ``isa<>``:
  84. The ``isa<>`` operator works exactly like the Java "``instanceof``" operator.
  85. It returns true or false depending on whether a reference or pointer points to
  86. an instance of the specified class. This can be very useful for constraint
  87. checking of various sorts (example below).
  88. ``cast<>``:
  89. The ``cast<>`` operator is a "checked cast" operation. It converts a pointer
  90. or reference from a base class to a derived class, causing an assertion
  91. failure if it is not really an instance of the right type. This should be
  92. used in cases where you have some information that makes you believe that
  93. something is of the right type. An example of the ``isa<>`` and ``cast<>``
  94. template is:
  95. .. code-block:: c++
  96. static bool isLoopInvariant(const Value *V, const Loop *L) {
  97. if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
  98. return true;
  99. // Otherwise, it must be an instruction...
  100. return !L->contains(cast<Instruction>(V)->getParent());
  101. }
  102. Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
  103. for that use the ``dyn_cast<>`` operator.
  104. ``dyn_cast<>``:
  105. The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see
  106. if the operand is of the specified type, and if so, returns a pointer to it
  107. (this operator does not work with references). If the operand is not of the
  108. correct type, a null pointer is returned. Thus, this works very much like
  109. the ``dynamic_cast<>`` operator in C++, and should be used in the same
  110. circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if``
  111. statement or some other flow control statement like this:
  112. .. code-block:: c++
  113. if (auto *AI = dyn_cast<AllocationInst>(Val)) {
  114. // ...
  115. }
  116. This form of the ``if`` statement effectively combines together a call to
  117. ``isa<>`` and a call to ``cast<>`` into one statement, which is very
  118. convenient.
  119. Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's
  120. ``instanceof`` operator, can be abused. In particular, you should not use big
  121. chained ``if/then/else`` blocks to check for lots of different variants of
  122. classes. If you find yourself wanting to do this, it is much cleaner and more
  123. efficient to use the ``InstVisitor`` class to dispatch over the instruction
  124. type directly.
  125. ``isa_and_nonnull<>``:
  126. The ``isa_and_nonnull<>`` operator works just like the ``isa<>`` operator,
  127. except that it allows for a null pointer as an argument (which it then
  128. returns false). This can sometimes be useful, allowing you to combine several
  129. null checks into one.
  130. ``cast_or_null<>``:
  131. The ``cast_or_null<>`` operator works just like the ``cast<>`` operator,
  132. except that it allows for a null pointer as an argument (which it then
  133. propagates). This can sometimes be useful, allowing you to combine several
  134. null checks into one.
  135. ``dyn_cast_or_null<>``:
  136. The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>``
  137. operator, except that it allows for a null pointer as an argument (which it
  138. then propagates). This can sometimes be useful, allowing you to combine
  139. several null checks into one.
  140. These five templates can be used with any classes, whether they have a v-table
  141. or not. If you want to add support for these templates, see the document
  142. :doc:`How to set up LLVM-style RTTI for your class hierarchy
  143. <HowToSetUpLLVMStyleRTTI>`
  144. .. _string_apis:
  145. Passing strings (the ``StringRef`` and ``Twine`` classes)
  146. ---------------------------------------------------------
  147. Although LLVM generally does not do much string manipulation, we do have several
  148. important APIs which take strings. Two important examples are the Value class
  149. -- which has names for instructions, functions, etc. -- and the ``StringMap``
  150. class which is used extensively in LLVM and Clang.
  151. These are generic classes, and they need to be able to accept strings which may
  152. have embedded null characters. Therefore, they cannot simply take a ``const
  153. char *``, and taking a ``const std::string&`` requires clients to perform a heap
  154. allocation which is usually unnecessary. Instead, many LLVM APIs use a
  155. ``StringRef`` or a ``const Twine&`` for passing strings efficiently.
  156. .. _StringRef:
  157. The ``StringRef`` class
  158. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  159. The ``StringRef`` data type represents a reference to a constant string (a
  160. character array and a length) and supports the common operations available on
  161. ``std::string``, but does not require heap allocation.
  162. It can be implicitly constructed using a C style null-terminated string, an
  163. ``std::string``, or explicitly with a character pointer and length. For
  164. example, the ``StringRef`` find function is declared as:
  165. .. code-block:: c++
  166. iterator find(StringRef Key);
  167. and clients can call it using any one of:
  168. .. code-block:: c++
  169. Map.find("foo"); // Lookup "foo"
  170. Map.find(std::string("bar")); // Lookup "bar"
  171. Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
  172. Similarly, APIs which need to return a string may return a ``StringRef``
  173. instance, which can be used directly or converted to an ``std::string`` using
  174. the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen
  175. <http://llvm.org/doxygen/StringRef_8h_source.html>`__) for more
  176. information.
  177. You should rarely use the ``StringRef`` class directly, because it contains
  178. pointers to external memory it is not generally safe to store an instance of the
  179. class (unless you know that the external storage will not be freed).
  180. ``StringRef`` is small and pervasive enough in LLVM that it should always be
  181. passed by value.
  182. The ``Twine`` class
  183. ^^^^^^^^^^^^^^^^^^^
  184. The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__)
  185. class is an efficient way for APIs to accept concatenated strings. For example,
  186. a common LLVM paradigm is to name one instruction based on the name of another
  187. instruction with a suffix, for example:
  188. .. code-block:: c++
  189. New = CmpInst::Create(..., SO->getName() + ".cmp");
  190. The ``Twine`` class is effectively a lightweight `rope
  191. <http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to
  192. temporary (stack allocated) objects. Twines can be implicitly constructed as
  193. the result of the plus operator applied to strings (i.e., a C strings, an
  194. ``std::string``, or a ``StringRef``). The twine delays the actual concatenation
  195. of strings until it is actually required, at which point it can be efficiently
  196. rendered directly into a character array. This avoids unnecessary heap
  197. allocation involved in constructing the temporary results of string
  198. concatenation. See ``llvm/ADT/Twine.h`` (`doxygen
  199. <http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>`
  200. for more information.
  201. As with a ``StringRef``, ``Twine`` objects point to external memory and should
  202. almost never be stored or mentioned directly. They are intended solely for use
  203. when defining a function which should be able to efficiently accept concatenated
  204. strings.
  205. .. _formatting_strings:
  206. Formatting strings (the ``formatv`` function)
  207. ---------------------------------------------
  208. While LLVM doesn't necessarily do a lot of string manipulation and parsing, it
  209. does do a lot of string formatting. From diagnostic messages, to llvm tool
  210. outputs such as ``llvm-readobj`` to printing verbose disassembly listings and
  211. LLDB runtime logging, the need for string formatting is pervasive.
  212. The ``formatv`` is similar in spirit to ``printf``, but uses a different syntax
  213. which borrows heavily from Python and C#. Unlike ``printf`` it deduces the type
  214. to be formatted at compile time, so it does not need a format specifier such as
  215. ``%d``. This reduces the mental overhead of trying to construct portable format
  216. strings, especially for platform-specific types like ``size_t`` or pointer types.
  217. Unlike both ``printf`` and Python, it additionally fails to compile if LLVM does
  218. not know how to format the type. These two properties ensure that the function
  219. is both safer and simpler to use than traditional formatting methods such as
  220. the ``printf`` family of functions.
  221. Simple formatting
  222. ^^^^^^^^^^^^^^^^^
  223. A call to ``formatv`` involves a single **format string** consisting of 0 or more
  224. **replacement sequences**, followed by a variable length list of **replacement values**.
  225. A replacement sequence is a string of the form ``{N[[,align]:style]}``.
  226. ``N`` refers to the 0-based index of the argument from the list of replacement
  227. values. Note that this means it is possible to reference the same parameter
  228. multiple times, possibly with different style and/or alignment options, in any order.
  229. ``align`` is an optional string specifying the width of the field to format
  230. the value into, and the alignment of the value within the field. It is specified as
  231. an optional **alignment style** followed by a positive integral **field width**. The
  232. alignment style can be one of the characters ``-`` (left align), ``=`` (center align),
  233. or ``+`` (right align). The default is right aligned.
  234. ``style`` is an optional string consisting of a type specific that controls the
  235. formatting of the value. For example, to format a floating point value as a percentage,
  236. you can use the style option ``P``.
  237. Custom formatting
  238. ^^^^^^^^^^^^^^^^^
  239. There are two ways to customize the formatting behavior for a type.
  240. 1. Provide a template specialization of ``llvm::format_provider<T>`` for your
  241. type ``T`` with the appropriate static format method.
  242. .. code-block:: c++
  243. namespace llvm {
  244. template<>
  245. struct format_provider<MyFooBar> {
  246. static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) {
  247. // Do whatever is necessary to format `V` into `Stream`
  248. }
  249. };
  250. void foo() {
  251. MyFooBar X;
  252. std::string S = formatv("{0}", X);
  253. }
  254. }
  255. This is a useful extensibility mechanism for adding support for formatting your own
  256. custom types with your own custom Style options. But it does not help when you want
  257. to extend the mechanism for formatting a type that the library already knows how to
  258. format. For that, we need something else.
  259. 2. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``.
  260. .. code-block:: c++
  261. namespace anything {
  262. struct format_int_custom : public llvm::FormatAdapter<int> {
  263. explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {}
  264. void format(llvm::raw_ostream &Stream, StringRef Style) override {
  265. // Do whatever is necessary to format ``this->Item`` into ``Stream``
  266. }
  267. };
  268. }
  269. namespace llvm {
  270. void foo() {
  271. std::string S = formatv("{0}", anything::format_int_custom(42));
  272. }
  273. }
  274. If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv``
  275. will call the
  276. ``format`` method on the argument passing in the specified style. This allows
  277. one to provide custom formatting of any type, including one which already has
  278. a builtin format provider.
  279. ``formatv`` Examples
  280. ^^^^^^^^^^^^^^^^^^^^
  281. Below is intended to provide an incomplete set of examples demonstrating
  282. the usage of ``formatv``. More information can be found by reading the
  283. doxygen documentation or by looking at the unit test suite.
  284. .. code-block:: c++
  285. std::string S;
  286. // Simple formatting of basic types and implicit string conversion.
  287. S = formatv("{0} ({1:P})", 7, 0.35); // S == "7 (35.00%)"
  288. // Out-of-order referencing and multi-referencing
  289. outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1"
  290. // Left, right, and center alignment
  291. S = formatv("{0,7}", 'a'); // S == " a";
  292. S = formatv("{0,-7}", 'a'); // S == "a ";
  293. S = formatv("{0,=7}", 'a'); // S == " a ";
  294. S = formatv("{0,+7}", 'a'); // S == " a";
  295. // Custom styles
  296. S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8"
  297. // Adapters
  298. S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7)); // S == " 42 "
  299. S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi"
  300. S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == " hi "
  301. // Ranges
  302. std::vector<int> V = {8, 9, 10};
  303. S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10"
  304. S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10"
  305. S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA"
  306. .. _error_apis:
  307. Error handling
  308. --------------
  309. Proper error handling helps us identify bugs in our code, and helps end-users
  310. understand errors in their tool usage. Errors fall into two broad categories:
  311. *programmatic* and *recoverable*, with different strategies for handling and
  312. reporting.
  313. Programmatic Errors
  314. ^^^^^^^^^^^^^^^^^^^
  315. Programmatic errors are violations of program invariants or API contracts, and
  316. represent bugs within the program itself. Our aim is to document invariants, and
  317. to abort quickly at the point of failure (providing some basic diagnostic) when
  318. invariants are broken at runtime.
  319. The fundamental tools for handling programmatic errors are assertions and the
  320. llvm_unreachable function. Assertions are used to express invariant conditions,
  321. and should include a message describing the invariant:
  322. .. code-block:: c++
  323. assert(isPhysReg(R) && "All virt regs should have been allocated already.");
  324. The llvm_unreachable function can be used to document areas of control flow
  325. that should never be entered if the program invariants hold:
  326. .. code-block:: c++
  327. enum { Foo, Bar, Baz } X = foo();
  328. switch (X) {
  329. case Foo: /* Handle Foo */; break;
  330. case Bar: /* Handle Bar */; break;
  331. default:
  332. llvm_unreachable("X should be Foo or Bar here");
  333. }
  334. Recoverable Errors
  335. ^^^^^^^^^^^^^^^^^^
  336. Recoverable errors represent an error in the program's environment, for example
  337. a resource failure (a missing file, a dropped network connection, etc.), or
  338. malformed input. These errors should be detected and communicated to a level of
  339. the program where they can be handled appropriately. Handling the error may be
  340. as simple as reporting the issue to the user, or it may involve attempts at
  341. recovery.
  342. .. note::
  343. While it would be ideal to use this error handling scheme throughout
  344. LLVM, there are places where this hasn't been practical to apply. In
  345. situations where you absolutely must emit a non-programmatic error and
  346. the ``Error`` model isn't workable you can call ``report_fatal_error``,
  347. which will call installed error handlers, print a message, and exit the
  348. program.
  349. Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme
  350. represents errors using function return values, similar to classic C integer
  351. error codes, or C++'s ``std::error_code``. However, the ``Error`` class is
  352. actually a lightweight wrapper for user-defined error types, allowing arbitrary
  353. information to be attached to describe the error. This is similar to the way C++
  354. exceptions allow throwing of user-defined types.
  355. Success values are created by calling ``Error::success()``, E.g.:
  356. .. code-block:: c++
  357. Error foo() {
  358. // Do something.
  359. // Return success.
  360. return Error::success();
  361. }
  362. Success values are very cheap to construct and return - they have minimal
  363. impact on program performance.
  364. Failure values are constructed using ``make_error<T>``, where ``T`` is any class
  365. that inherits from the ErrorInfo utility, E.g.:
  366. .. code-block:: c++
  367. class BadFileFormat : public ErrorInfo<BadFileFormat> {
  368. public:
  369. static char ID;
  370. std::string Path;
  371. BadFileFormat(StringRef Path) : Path(Path.str()) {}
  372. void log(raw_ostream &OS) const override {
  373. OS << Path << " is malformed";
  374. }
  375. std::error_code convertToErrorCode() const override {
  376. return make_error_code(object_error::parse_failed);
  377. }
  378. };
  379. char BadFileFormat::ID; // This should be declared in the C++ file.
  380. Error printFormattedFile(StringRef Path) {
  381. if (<check for valid format>)
  382. return make_error<BadFileFormat>(Path);
  383. // print file contents.
  384. return Error::success();
  385. }
  386. Error values can be implicitly converted to bool: true for error, false for
  387. success, enabling the following idiom:
  388. .. code-block:: c++
  389. Error mayFail();
  390. Error foo() {
  391. if (auto Err = mayFail())
  392. return Err;
  393. // Success! We can proceed.
  394. ...
  395. For functions that can fail but need to return a value the ``Expected<T>``
  396. utility can be used. Values of this type can be constructed with either a
  397. ``T``, or an ``Error``. Expected<T> values are also implicitly convertible to
  398. boolean, but with the opposite convention to ``Error``: true for success, false
  399. for error. If success, the ``T`` value can be accessed via the dereference
  400. operator. If failure, the ``Error`` value can be extracted using the
  401. ``takeError()`` method. Idiomatic usage looks like:
  402. .. code-block:: c++
  403. Expected<FormattedFile> openFormattedFile(StringRef Path) {
  404. // If badly formatted, return an error.
  405. if (auto Err = checkFormat(Path))
  406. return std::move(Err);
  407. // Otherwise return a FormattedFile instance.
  408. return FormattedFile(Path);
  409. }
  410. Error processFormattedFile(StringRef Path) {
  411. // Try to open a formatted file
  412. if (auto FileOrErr = openFormattedFile(Path)) {
  413. // On success, grab a reference to the file and continue.
  414. auto &File = *FileOrErr;
  415. ...
  416. } else
  417. // On error, extract the Error value and return it.
  418. return FileOrErr.takeError();
  419. }
  420. If an ``Expected<T>`` value is in success mode then the ``takeError()`` method
  421. will return a success value. Using this fact, the above function can be
  422. rewritten as:
  423. .. code-block:: c++
  424. Error processFormattedFile(StringRef Path) {
  425. // Try to open a formatted file
  426. auto FileOrErr = openFormattedFile(Path);
  427. if (auto Err = FileOrErr.takeError())
  428. // On error, extract the Error value and return it.
  429. return Err;
  430. // On success, grab a reference to the file and continue.
  431. auto &File = *FileOrErr;
  432. ...
  433. }
  434. This second form is often more readable for functions that involve multiple
  435. ``Expected<T>`` values as it limits the indentation required.
  436. All ``Error`` instances, whether success or failure, must be either checked or
  437. moved from (via ``std::move`` or a return) before they are destructed.
  438. Accidentally discarding an unchecked error will cause a program abort at the
  439. point where the unchecked value's destructor is run, making it easy to identify
  440. and fix violations of this rule.
  441. Success values are considered checked once they have been tested (by invoking
  442. the boolean conversion operator):
  443. .. code-block:: c++
  444. if (auto Err = mayFail(...))
  445. return Err; // Failure value - move error to caller.
  446. // Safe to continue: Err was checked.
  447. In contrast, the following code will always cause an abort, even if ``mayFail``
  448. returns a success value:
  449. .. code-block:: c++
  450. mayFail();
  451. // Program will always abort here, even if mayFail() returns Success, since
  452. // the value is not checked.
  453. Failure values are considered checked once a handler for the error type has
  454. been activated:
  455. .. code-block:: c++
  456. handleErrors(
  457. processFormattedFile(...),
  458. [](const BadFileFormat &BFF) {
  459. report("Unable to process " + BFF.Path + ": bad format");
  460. },
  461. [](const FileNotFound &FNF) {
  462. report("File not found " + FNF.Path);
  463. });
  464. The ``handleErrors`` function takes an error as its first argument, followed by
  465. a variadic list of "handlers", each of which must be a callable type (a
  466. function, lambda, or class with a call operator) with one argument. The
  467. ``handleErrors`` function will visit each handler in the sequence and check its
  468. argument type against the dynamic type of the error, running the first handler
  469. that matches. This is the same decision process that is used decide which catch
  470. clause to run for a C++ exception.
  471. Since the list of handlers passed to ``handleErrors`` may not cover every error
  472. type that can occur, the ``handleErrors`` function also returns an Error value
  473. that must be checked or propagated. If the error value that is passed to
  474. ``handleErrors`` does not match any of the handlers it will be returned from
  475. handleErrors. Idiomatic use of ``handleErrors`` thus looks like:
  476. .. code-block:: c++
  477. if (auto Err =
  478. handleErrors(
  479. processFormattedFile(...),
  480. [](const BadFileFormat &BFF) {
  481. report("Unable to process " + BFF.Path + ": bad format");
  482. },
  483. [](const FileNotFound &FNF) {
  484. report("File not found " + FNF.Path);
  485. }))
  486. return Err;
  487. In cases where you truly know that the handler list is exhaustive the
  488. ``handleAllErrors`` function can be used instead. This is identical to
  489. ``handleErrors`` except that it will terminate the program if an unhandled
  490. error is passed in, and can therefore return void. The ``handleAllErrors``
  491. function should generally be avoided: the introduction of a new error type
  492. elsewhere in the program can easily turn a formerly exhaustive list of errors
  493. into a non-exhaustive list, risking unexpected program termination. Where
  494. possible, use handleErrors and propagate unknown errors up the stack instead.
  495. For tool code, where errors can be handled by printing an error message then
  496. exiting with an error code, the :ref:`ExitOnError <err_exitonerr>` utility
  497. may be a better choice than handleErrors, as it simplifies control flow when
  498. calling fallible functions.
  499. In situations where it is known that a particular call to a fallible function
  500. will always succeed (for example, a call to a function that can only fail on a
  501. subset of inputs with an input that is known to be safe) the
  502. :ref:`cantFail <err_cantfail>` functions can be used to remove the error type,
  503. simplifying control flow.
  504. StringError
  505. """""""""""
  506. Many kinds of errors have no recovery strategy, the only action that can be
  507. taken is to report them to the user so that the user can attempt to fix the
  508. environment. In this case representing the error as a string makes perfect
  509. sense. LLVM provides the ``StringError`` class for this purpose. It takes two
  510. arguments: A string error message, and an equivalent ``std::error_code`` for
  511. interoperability. It also provides a ``createStringError`` function to simplify
  512. common usage of this class:
  513. .. code-block:: c++
  514. // These two lines of code are equivalent:
  515. make_error<StringError>("Bad executable", errc::executable_format_error);
  516. createStringError(errc::executable_format_error, "Bad executable");
  517. If you're certain that the error you're building will never need to be converted
  518. to a ``std::error_code`` you can use the ``inconvertibleErrorCode()`` function:
  519. .. code-block:: c++
  520. createStringError(inconvertibleErrorCode(), "Bad executable");
  521. This should be done only after careful consideration. If any attempt is made to
  522. convert this error to a ``std::error_code`` it will trigger immediate program
  523. termination. Unless you are certain that your errors will not need
  524. interoperability you should look for an existing ``std::error_code`` that you
  525. can convert to, and even (as painful as it is) consider introducing a new one as
  526. a stopgap measure.
  527. ``createStringError`` can take ``printf`` style format specifiers to provide a
  528. formatted message:
  529. .. code-block:: c++
  530. createStringError(errc::executable_format_error,
  531. "Bad executable: %s", FileName);
  532. Interoperability with std::error_code and ErrorOr
  533. """""""""""""""""""""""""""""""""""""""""""""""""
  534. Many existing LLVM APIs use ``std::error_code`` and its partner ``ErrorOr<T>``
  535. (which plays the same role as ``Expected<T>``, but wraps a ``std::error_code``
  536. rather than an ``Error``). The infectious nature of error types means that an
  537. attempt to change one of these functions to return ``Error`` or ``Expected<T>``
  538. instead often results in an avalanche of changes to callers, callers of callers,
  539. and so on. (The first such attempt, returning an ``Error`` from
  540. MachOObjectFile's constructor, was abandoned after the diff reached 3000 lines,
  541. impacted half a dozen libraries, and was still growing).
  542. To solve this problem, the ``Error``/``std::error_code`` interoperability requirement was
  543. introduced. Two pairs of functions allow any ``Error`` value to be converted to a
  544. ``std::error_code``, any ``Expected<T>`` to be converted to an ``ErrorOr<T>``, and vice
  545. versa:
  546. .. code-block:: c++
  547. std::error_code errorToErrorCode(Error Err);
  548. Error errorCodeToError(std::error_code EC);
  549. template <typename T> ErrorOr<T> expectedToErrorOr(Expected<T> TOrErr);
  550. template <typename T> Expected<T> errorOrToExpected(ErrorOr<T> TOrEC);
  551. Using these APIs it is easy to make surgical patches that update individual
  552. functions from ``std::error_code`` to ``Error``, and from ``ErrorOr<T>`` to
  553. ``Expected<T>``.
  554. Returning Errors from error handlers
  555. """"""""""""""""""""""""""""""""""""
  556. Error recovery attempts may themselves fail. For that reason, ``handleErrors``
  557. actually recognises three different forms of handler signature:
  558. .. code-block:: c++
  559. // Error must be handled, no new errors produced:
  560. void(UserDefinedError &E);
  561. // Error must be handled, new errors can be produced:
  562. Error(UserDefinedError &E);
  563. // Original error can be inspected, then re-wrapped and returned (or a new
  564. // error can be produced):
  565. Error(std::unique_ptr<UserDefinedError> E);
  566. Any error returned from a handler will be returned from the ``handleErrors``
  567. function so that it can be handled itself, or propagated up the stack.
  568. .. _err_exitonerr:
  569. Using ExitOnError to simplify tool code
  570. """""""""""""""""""""""""""""""""""""""
  571. Library code should never call ``exit`` for a recoverable error, however in tool
  572. code (especially command line tools) this can be a reasonable approach. Calling
  573. ``exit`` upon encountering an error dramatically simplifies control flow as the
  574. error no longer needs to be propagated up the stack. This allows code to be
  575. written in straight-line style, as long as each fallible call is wrapped in a
  576. check and call to exit. The ``ExitOnError`` class supports this pattern by
  577. providing call operators that inspect ``Error`` values, stripping the error away
  578. in the success case and logging to ``stderr`` then exiting in the failure case.
  579. To use this class, declare a global ``ExitOnError`` variable in your program:
  580. .. code-block:: c++
  581. ExitOnError ExitOnErr;
  582. Calls to fallible functions can then be wrapped with a call to ``ExitOnErr``,
  583. turning them into non-failing calls:
  584. .. code-block:: c++
  585. Error mayFail();
  586. Expected<int> mayFail2();
  587. void foo() {
  588. ExitOnErr(mayFail());
  589. int X = ExitOnErr(mayFail2());
  590. }
  591. On failure, the error's log message will be written to ``stderr``, optionally
  592. preceded by a string "banner" that can be set by calling the setBanner method. A
  593. mapping can also be supplied from ``Error`` values to exit codes using the
  594. ``setExitCodeMapper`` method:
  595. .. code-block:: c++
  596. int main(int argc, char *argv[]) {
  597. ExitOnErr.setBanner(std::string(argv[0]) + " error:");
  598. ExitOnErr.setExitCodeMapper(
  599. [](const Error &Err) {
  600. if (Err.isA<BadFileFormat>())
  601. return 2;
  602. return 1;
  603. });
  604. Use ``ExitOnError`` in your tool code where possible as it can greatly improve
  605. readability.
  606. .. _err_cantfail:
  607. Using cantFail to simplify safe callsites
  608. """""""""""""""""""""""""""""""""""""""""
  609. Some functions may only fail for a subset of their inputs, so calls using known
  610. safe inputs can be assumed to succeed.
  611. The cantFail functions encapsulate this by wrapping an assertion that their
  612. argument is a success value and, in the case of Expected<T>, unwrapping the
  613. T value:
  614. .. code-block:: c++
  615. Error onlyFailsForSomeXValues(int X);
  616. Expected<int> onlyFailsForSomeXValues2(int X);
  617. void foo() {
  618. cantFail(onlyFailsForSomeXValues(KnownSafeValue));
  619. int Y = cantFail(onlyFailsForSomeXValues2(KnownSafeValue));
  620. ...
  621. }
  622. Like the ExitOnError utility, cantFail simplifies control flow. Their treatment
  623. of error cases is very different however: Where ExitOnError is guaranteed to
  624. terminate the program on an error input, cantFail simply asserts that the result
  625. is success. In debug builds this will result in an assertion failure if an error
  626. is encountered. In release builds the behavior of cantFail for failure values is
  627. undefined. As such, care must be taken in the use of cantFail: clients must be
  628. certain that a cantFail wrapped call really can not fail with the given
  629. arguments.
  630. Use of the cantFail functions should be rare in library code, but they are
  631. likely to be of more use in tool and unit-test code where inputs and/or
  632. mocked-up classes or functions may be known to be safe.
  633. Fallible constructors
  634. """""""""""""""""""""
  635. Some classes require resource acquisition or other complex initialization that
  636. can fail during construction. Unfortunately constructors can't return errors,
  637. and having clients test objects after they're constructed to ensure that they're
  638. valid is error prone as it's all too easy to forget the test. To work around
  639. this, use the named constructor idiom and return an ``Expected<T>``:
  640. .. code-block:: c++
  641. class Foo {
  642. public:
  643. static Expected<Foo> Create(Resource R1, Resource R2) {
  644. Error Err;
  645. Foo F(R1, R2, Err);
  646. if (Err)
  647. return std::move(Err);
  648. return std::move(F);
  649. }
  650. private:
  651. Foo(Resource R1, Resource R2, Error &Err) {
  652. ErrorAsOutParameter EAO(&Err);
  653. if (auto Err2 = R1.acquire()) {
  654. Err = std::move(Err2);
  655. return;
  656. }
  657. Err = R2.acquire();
  658. }
  659. };
  660. Here, the named constructor passes an ``Error`` by reference into the actual
  661. constructor, which the constructor can then use to return errors. The
  662. ``ErrorAsOutParameter`` utility sets the ``Error`` value's checked flag on entry
  663. to the constructor so that the error can be assigned to, then resets it on exit
  664. to force the client (the named constructor) to check the error.
  665. By using this idiom, clients attempting to construct a Foo receive either a
  666. well-formed Foo or an Error, never an object in an invalid state.
  667. Propagating and consuming errors based on types
  668. """""""""""""""""""""""""""""""""""""""""""""""
  669. In some contexts, certain types of error are known to be benign. For example,
  670. when walking an archive, some clients may be happy to skip over badly formatted
  671. object files rather than terminating the walk immediately. Skipping badly
  672. formatted objects could be achieved using an elaborate handler method, but the
  673. Error.h header provides two utilities that make this idiom much cleaner: the
  674. type inspection method, ``isA``, and the ``consumeError`` function:
  675. .. code-block:: c++
  676. Error walkArchive(Archive A) {
  677. for (unsigned I = 0; I != A.numMembers(); ++I) {
  678. auto ChildOrErr = A.getMember(I);
  679. if (auto Err = ChildOrErr.takeError()) {
  680. if (Err.isA<BadFileFormat>())
  681. consumeError(std::move(Err))
  682. else
  683. return Err;
  684. }
  685. auto &Child = *ChildOrErr;
  686. // Use Child
  687. ...
  688. }
  689. return Error::success();
  690. }
  691. Concatenating Errors with joinErrors
  692. """"""""""""""""""""""""""""""""""""
  693. In the archive walking example above ``BadFileFormat`` errors are simply
  694. consumed and ignored. If the client had wanted report these errors after
  695. completing the walk over the archive they could use the ``joinErrors`` utility:
  696. .. code-block:: c++
  697. Error walkArchive(Archive A) {
  698. Error DeferredErrs = Error::success();
  699. for (unsigned I = 0; I != A.numMembers(); ++I) {
  700. auto ChildOrErr = A.getMember(I);
  701. if (auto Err = ChildOrErr.takeError())
  702. if (Err.isA<BadFileFormat>())
  703. DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err));
  704. else
  705. return Err;
  706. auto &Child = *ChildOrErr;
  707. // Use Child
  708. ...
  709. }
  710. return DeferredErrs;
  711. }
  712. The ``joinErrors`` routine builds a special error type called ``ErrorList``,
  713. which holds a list of user defined errors. The ``handleErrors`` routine
  714. recognizes this type and will attempt to handle each of the contained errors in
  715. order. If all contained errors can be handled, ``handleErrors`` will return
  716. ``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining
  717. errors and return the resulting ``ErrorList``.
  718. Building fallible iterators and iterator ranges
  719. """""""""""""""""""""""""""""""""""""""""""""""
  720. The archive walking examples above retrieve archive members by index, however
  721. this requires considerable boiler-plate for iteration and error checking. We can
  722. clean this up by using the "fallible iterator" pattern, which supports the
  723. following natural iteration idiom for fallible containers like Archive:
  724. .. code-block:: c++
  725. Error Err;
  726. for (auto &Child : Ar->children(Err)) {
  727. // Use Child - only enter the loop when it's valid
  728. // Allow early exit from the loop body, since we know that Err is success
  729. // when we're inside the loop.
  730. if (BailOutOn(Child))
  731. return;
  732. ...
  733. }
  734. // Check Err after the loop to ensure it didn't break due to an error.
  735. if (Err)
  736. return Err;
  737. To enable this idiom, iterators over fallible containers are written in a
  738. natural style, with their ``++`` and ``--`` operators replaced with fallible
  739. ``Error inc()`` and ``Error dec()`` functions. E.g.:
  740. .. code-block:: c++
  741. class FallibleChildIterator {
  742. public:
  743. FallibleChildIterator(Archive &A, unsigned ChildIdx);
  744. Archive::Child &operator*();
  745. friend bool operator==(const ArchiveIterator &LHS,
  746. const ArchiveIterator &RHS);
  747. // operator++/operator-- replaced with fallible increment / decrement:
  748. Error inc() {
  749. if (!A.childValid(ChildIdx + 1))
  750. return make_error<BadArchiveMember>(...);
  751. ++ChildIdx;
  752. return Error::success();
  753. }
  754. Error dec() { ... }
  755. };
  756. Instances of this kind of fallible iterator interface are then wrapped with the
  757. fallible_iterator utility which provides ``operator++`` and ``operator--``,
  758. returning any errors via a reference passed in to the wrapper at construction
  759. time. The fallible_iterator wrapper takes care of (a) jumping to the end of the
  760. range on error, and (b) marking the error as checked whenever an iterator is
  761. compared to ``end`` and found to be inequal (in particular: this marks the
  762. error as checked throughout the body of a range-based for loop), enabling early
  763. exit from the loop without redundant error checking.
  764. Instances of the fallible iterator interface (e.g. FallibleChildIterator above)
  765. are wrapped using the ``make_fallible_itr`` and ``make_fallible_end``
  766. functions. E.g.:
  767. .. code-block:: c++
  768. class Archive {
  769. public:
  770. using child_iterator = fallible_iterator<FallibleChildIterator>;
  771. child_iterator child_begin(Error &Err) {
  772. return make_fallible_itr(FallibleChildIterator(*this, 0), Err);
  773. }
  774. child_iterator child_end() {
  775. return make_fallible_end(FallibleChildIterator(*this, size()));
  776. }
  777. iterator_range<child_iterator> children(Error &Err) {
  778. return make_range(child_begin(Err), child_end());
  779. }
  780. };
  781. Using the fallible_iterator utility allows for both natural construction of
  782. fallible iterators (using failing ``inc`` and ``dec`` operations) and
  783. relatively natural use of c++ iterator/loop idioms.
  784. .. _function_apis:
  785. More information on Error and its related utilities can be found in the
  786. Error.h header file.
  787. Passing functions and other callable objects
  788. --------------------------------------------
  789. Sometimes you may want a function to be passed a callback object. In order to
  790. support lambda expressions and other function objects, you should not use the
  791. traditional C approach of taking a function pointer and an opaque cookie:
  792. .. code-block:: c++
  793. void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);
  794. Instead, use one of the following approaches:
  795. Function template
  796. ^^^^^^^^^^^^^^^^^
  797. If you don't mind putting the definition of your function into a header file,
  798. make it a function template that is templated on the callable type.
  799. .. code-block:: c++
  800. template<typename Callable>
  801. void takeCallback(Callable Callback) {
  802. Callback(1, 2, 3);
  803. }
  804. The ``function_ref`` class template
  805. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  806. The ``function_ref``
  807. (`doxygen <http://llvm.org/doxygen/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class
  808. template represents a reference to a callable object, templated over the type
  809. of the callable. This is a good choice for passing a callback to a function,
  810. if you don't need to hold onto the callback after the function returns. In this
  811. way, ``function_ref`` is to ``std::function`` as ``StringRef`` is to
  812. ``std::string``.
  813. ``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from
  814. any callable object that can be called with arguments of type ``Param1``,
  815. ``Param2``, ..., and returns a value that can be converted to type ``Ret``.
  816. For example:
  817. .. code-block:: c++
  818. void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
  819. for (BasicBlock &BB : *F)
  820. if (Callback(&BB))
  821. return;
  822. }
  823. can be called using:
  824. .. code-block:: c++
  825. visitBasicBlocks(F, [&](BasicBlock *BB) {
  826. if (process(BB))
  827. return isEmpty(BB);
  828. return false;
  829. });
  830. Note that a ``function_ref`` object contains pointers to external memory, so it
  831. is not generally safe to store an instance of the class (unless you know that
  832. the external storage will not be freed). If you need this ability, consider
  833. using ``std::function``. ``function_ref`` is small enough that it should always
  834. be passed by value.
  835. .. _DEBUG:
  836. The ``LLVM_DEBUG()`` macro and ``-debug`` option
  837. ------------------------------------------------
  838. Often when working on your pass you will put a bunch of debugging printouts and
  839. other code into your pass. After you get it working, you want to remove it, but
  840. you may need it again in the future (to work out new bugs that you run across).
  841. Naturally, because of this, you don't want to delete the debug printouts, but
  842. you don't want them to always be noisy. A standard compromise is to comment
  843. them out, allowing you to enable them if you need them in the future.
  844. The ``llvm/Support/Debug.h`` (`doxygen
  845. <http://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named
  846. ``LLVM_DEBUG()`` that is a much nicer solution to this problem. Basically, you can
  847. put arbitrary code into the argument of the ``LLVM_DEBUG`` macro, and it is only
  848. executed if '``opt``' (or any other tool) is run with the '``-debug``' command
  849. line argument:
  850. .. code-block:: c++
  851. LLVM_DEBUG(dbgs() << "I am here!\n");
  852. Then you can run your pass like this:
  853. .. code-block:: none
  854. $ opt < a.bc > /dev/null -mypass
  855. <no output>
  856. $ opt < a.bc > /dev/null -mypass -debug
  857. I am here!
  858. Using the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not
  859. have to create "yet another" command line option for the debug output for your
  860. pass. Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they
  861. do not cause a performance impact at all (for the same reason, they should also
  862. not contain side-effects!).
  863. One additional nice thing about the ``LLVM_DEBUG()`` macro is that you can enable or
  864. disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set
  865. DebugFlag=1``" from the gdb if the program is running. If the program hasn't
  866. been started yet, you can always just run it with ``-debug``.
  867. .. _DEBUG_TYPE:
  868. Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
  869. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  870. Sometimes you may find yourself in a situation where enabling ``-debug`` just
  871. turns on **too much** information (such as when working on the code generator).
  872. If you want to enable debug information with more fine-grained control, you
  873. should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
  874. follows:
  875. .. code-block:: c++
  876. #define DEBUG_TYPE "foo"
  877. LLVM_DEBUG(dbgs() << "'foo' debug type\n");
  878. #undef DEBUG_TYPE
  879. #define DEBUG_TYPE "bar"
  880. LLVM_DEBUG(dbgs() << "'bar' debug type\n");
  881. #undef DEBUG_TYPE
  882. Then you can run your pass like this:
  883. .. code-block:: none
  884. $ opt < a.bc > /dev/null -mypass
  885. <no output>
  886. $ opt < a.bc > /dev/null -mypass -debug
  887. 'foo' debug type
  888. 'bar' debug type
  889. $ opt < a.bc > /dev/null -mypass -debug-only=foo
  890. 'foo' debug type
  891. $ opt < a.bc > /dev/null -mypass -debug-only=bar
  892. 'bar' debug type
  893. $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
  894. 'foo' debug type
  895. 'bar' debug type
  896. Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
  897. to specify the debug type for the entire module. Be careful that you only do
  898. this after including Debug.h and not around any #include of headers. Also, you
  899. should use names more meaningful than "foo" and "bar", because there is no
  900. system in place to ensure that names do not conflict. If two different modules
  901. use the same string, they will all be turned on when the name is specified.
  902. This allows, for example, all debug information for instruction scheduling to be
  903. enabled with ``-debug-only=InstrSched``, even if the source lives in multiple
  904. files. The name must not include a comma (,) as that is used to separate the
  905. arguments of the ``-debug-only`` option.
  906. For performance reasons, -debug-only is not available in optimized build
  907. (``--enable-optimized``) of LLVM.
  908. The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
  909. like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It
  910. takes an additional first parameter, which is the type to use. For example, the
  911. preceding example could be written as:
  912. .. code-block:: c++
  913. DEBUG_WITH_TYPE("foo", dbgs() << "'foo' debug type\n");
  914. DEBUG_WITH_TYPE("bar", dbgs() << "'bar' debug type\n");
  915. .. _Statistic:
  916. The ``Statistic`` class & ``-stats`` option
  917. -------------------------------------------
  918. The ``llvm/ADT/Statistic.h`` (`doxygen
  919. <http://llvm.org/doxygen/Statistic_8h_source.html>`__) file provides a class
  920. named ``Statistic`` that is used as a unified way to keep track of what the LLVM
  921. compiler is doing and how effective various optimizations are. It is useful to
  922. see what optimizations are contributing to making a particular program run
  923. faster.
  924. Often you may run your pass on some big program, and you're interested to see
  925. how many times it makes a certain transformation. Although you can do this with
  926. hand inspection, or some ad-hoc method, this is a real pain and not very useful
  927. for big programs. Using the ``Statistic`` class makes it very easy to keep
  928. track of this information, and the calculated information is presented in a
  929. uniform manner with the rest of the passes being executed.
  930. There are many examples of ``Statistic`` uses, but the basics of using it are as
  931. follows:
  932. Define your statistic like this:
  933. .. code-block:: c++
  934. #define DEBUG_TYPE "mypassname" // This goes before any #includes.
  935. STATISTIC(NumXForms, "The # of times I did stuff");
  936. The ``STATISTIC`` macro defines a static variable, whose name is specified by
  937. the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and
  938. the description is taken from the second argument. The variable defined
  939. ("NumXForms" in this case) acts like an unsigned integer.
  940. Whenever you make a transformation, bump the counter:
  941. .. code-block:: c++
  942. ++NumXForms; // I did stuff!
  943. That's all you have to do. To get '``opt``' to print out the statistics
  944. gathered, use the '``-stats``' option:
  945. .. code-block:: none
  946. $ opt -stats -mypassname < program.bc > /dev/null
  947. ... statistics output ...
  948. Note that in order to use the '``-stats``' option, LLVM must be
  949. compiled with assertions enabled.
  950. When running ``opt`` on a C file from the SPEC benchmark suite, it gives a
  951. report that looks like this:
  952. .. code-block:: none
  953. 7646 bitcodewriter - Number of normal instructions
  954. 725 bitcodewriter - Number of oversized instructions
  955. 129996 bitcodewriter - Number of bitcode bytes written
  956. 2817 raise - Number of insts DCEd or constprop'd
  957. 3213 raise - Number of cast-of-self removed
  958. 5046 raise - Number of expression trees converted
  959. 75 raise - Number of other getelementptr's formed
  960. 138 raise - Number of load/store peepholes
  961. 42 deadtypeelim - Number of unused typenames removed from symtab
  962. 392 funcresolve - Number of varargs functions resolved
  963. 27 globaldce - Number of global variables removed
  964. 2 adce - Number of basic blocks removed
  965. 134 cee - Number of branches revectored
  966. 49 cee - Number of setcc instruction eliminated
  967. 532 gcse - Number of loads removed
  968. 2919 gcse - Number of instructions removed
  969. 86 indvars - Number of canonical indvars added
  970. 87 indvars - Number of aux indvars removed
  971. 25 instcombine - Number of dead inst eliminate
  972. 434 instcombine - Number of insts combined
  973. 248 licm - Number of load insts hoisted
  974. 1298 licm - Number of insts hoisted to a loop pre-header
  975. 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
  976. 75 mem2reg - Number of alloca's promoted
  977. 1444 cfgsimplify - Number of blocks simplified
  978. Obviously, with so many optimizations, having a unified framework for this stuff
  979. is very nice. Making your pass fit well into the framework makes it more
  980. maintainable and useful.
  981. .. _DebugCounters:
  982. Adding debug counters to aid in debugging your code
  983. ---------------------------------------------------
  984. Sometimes, when writing new passes, or trying to track down bugs, it
  985. is useful to be able to control whether certain things in your pass
  986. happen or not. For example, there are times the minimization tooling
  987. can only easily give you large testcases. You would like to narrow
  988. your bug down to a specific transformation happening or not happening,
  989. automatically, using bisection. This is where debug counters help.
  990. They provide a framework for making parts of your code only execute a
  991. certain number of times.
  992. The ``llvm/Support/DebugCounter.h`` (`doxygen
  993. <http://llvm.org/doxygen/DebugCounter_8h_source.html>`__) file
  994. provides a class named ``DebugCounter`` that can be used to create
  995. command line counter options that control execution of parts of your code.
  996. Define your DebugCounter like this:
  997. .. code-block:: c++
  998. DEBUG_COUNTER(DeleteAnInstruction, "passname-delete-instruction",
  999. "Controls which instructions get delete");
  1000. The ``DEBUG_COUNTER`` macro defines a static variable, whose name
  1001. is specified by the first argument. The name of the counter
  1002. (which is used on the command line) is specified by the second
  1003. argument, and the description used in the help is specified by the
  1004. third argument.
  1005. Whatever code you want that control, use ``DebugCounter::shouldExecute`` to control it.
  1006. .. code-block:: c++
  1007. if (DebugCounter::shouldExecute(DeleteAnInstruction))
  1008. I->eraseFromParent();
  1009. That's all you have to do. Now, using opt, you can control when this code triggers using
  1010. the '``--debug-counter``' option. There are two counters provided, ``skip`` and ``count``.
  1011. ``skip`` is the number of times to skip execution of the codepath. ``count`` is the number
  1012. of times, once we are done skipping, to execute the codepath.
  1013. .. code-block:: none
  1014. $ opt --debug-counter=passname-delete-instruction-skip=1,passname-delete-instruction-count=2 -passname
  1015. This will skip the above code the first time we hit it, then execute it twice, then skip the rest of the executions.
  1016. So if executed on the following code:
  1017. .. code-block:: llvm
  1018. %1 = add i32 %a, %b
  1019. %2 = add i32 %a, %b
  1020. %3 = add i32 %a, %b
  1021. %4 = add i32 %a, %b
  1022. It would delete number ``%2`` and ``%3``.
  1023. A utility is provided in `utils/bisect-skip-count` to binary search
  1024. skip and count arguments. It can be used to automatically minimize the
  1025. skip and count for a debug-counter variable.
  1026. .. _ViewGraph:
  1027. Viewing graphs while debugging code
  1028. -----------------------------------
  1029. Several of the important data structures in LLVM are graphs: for example CFGs
  1030. made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM
  1031. :ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection
  1032. DAGs <SelectionDAG>`. In many cases, while debugging various parts of the
  1033. compiler, it is nice to instantly visualize these graphs.
  1034. LLVM provides several callbacks that are available in a debug build to do
  1035. exactly that. If you call the ``Function::viewCFG()`` method, for example, the
  1036. current LLVM tool will pop up a window containing the CFG for the function where
  1037. each basic block is a node in the graph, and each node contains the instructions
  1038. in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does
  1039. not include the instructions), the ``MachineFunction::viewCFG()`` and
  1040. ``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()``
  1041. methods. Within GDB, for example, you can usually use something like ``call
  1042. DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to
  1043. these functions in your code in places you want to debug.
  1044. Getting this to work requires a small amount of setup. On Unix systems
  1045. with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make
  1046. sure 'dot' and 'gv' are in your path. If you are running on macOS, download
  1047. and install the macOS `Graphviz program
  1048. <http://www.pixelglow.com/graphviz/>`_ and add
  1049. ``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to
  1050. your path. The programs need not be present when configuring, building or
  1051. running LLVM and can simply be installed when needed during an active debug
  1052. session.
  1053. ``SelectionDAG`` has been extended to make it easier to locate *interesting*
  1054. nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node,
  1055. "color")``, then the next ``call DAG.viewGraph()`` would highlight the node in
  1056. the specified color (choices of colors can be found at `colors
  1057. <http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes
  1058. can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can
  1059. be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.)
  1060. If you want to restart and clear all the current graph attributes, then you can
  1061. ``call DAG.clearGraphAttrs()``.
  1062. Note that graph visualization features are compiled out of Release builds to
  1063. reduce file size. This means that you need a Debug+Asserts or Release+Asserts
  1064. build to use these features.
  1065. .. _datastructure:
  1066. Picking the Right Data Structure for a Task
  1067. ===========================================
  1068. LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we
  1069. commonly use STL data structures. This section describes the trade-offs you
  1070. should consider when you pick one.
  1071. The first step is a choose your own adventure: do you want a sequential
  1072. container, a set-like container, or a map-like container? The most important
  1073. thing when choosing a container is the algorithmic properties of how you plan to
  1074. access the container. Based on that, you should use:
  1075. * a :ref:`map-like <ds_map>` container if you need efficient look-up of a
  1076. value based on another value. Map-like containers also support efficient
  1077. queries for containment (whether a key is in the map). Map-like containers
  1078. generally do not support efficient reverse mapping (values to keys). If you
  1079. need that, use two maps. Some map-like containers also support efficient
  1080. iteration through the keys in sorted order. Map-like containers are the most
  1081. expensive sort, only use them if you need one of these capabilities.
  1082. * a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into
  1083. a container that automatically eliminates duplicates. Some set-like
  1084. containers support efficient iteration through the elements in sorted order.
  1085. Set-like containers are more expensive than sequential containers.
  1086. * a :ref:`sequential <ds_sequential>` container provides the most efficient way
  1087. to add elements and keeps track of the order they are added to the collection.
  1088. They permit duplicates and support efficient iteration, but do not support
  1089. efficient look-up based on a key.
  1090. * a :ref:`string <ds_string>` container is a specialized sequential container or
  1091. reference structure that is used for character or byte arrays.
  1092. * a :ref:`bit <ds_bit>` container provides an efficient way to store and
  1093. perform set operations on sets of numeric id's, while automatically
  1094. eliminating duplicates. Bit containers require a maximum of 1 bit for each
  1095. identifier you want to store.
  1096. Once the proper category of container is determined, you can fine tune the
  1097. memory use, constant factors, and cache behaviors of access by intelligently
  1098. picking a member of the category. Note that constant factors and cache behavior
  1099. can be a big deal. If you have a vector that usually only contains a few
  1100. elements (but could contain many), for example, it's much better to use
  1101. :ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so
  1102. avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding
  1103. the elements to the container.
  1104. .. _ds_sequential:
  1105. Sequential Containers (std::vector, std::list, etc)
  1106. ---------------------------------------------------
  1107. There are a variety of sequential containers available for you, based on your
  1108. needs. Pick the first in this section that will do what you want.
  1109. .. _dss_arrayref:
  1110. llvm/ADT/ArrayRef.h
  1111. ^^^^^^^^^^^^^^^^^^^
  1112. The ``llvm::ArrayRef`` class is the preferred class to use in an interface that
  1113. accepts a sequential list of elements in memory and just reads from them. By
  1114. taking an ``ArrayRef``, the API can be passed a fixed size array, an
  1115. ``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous
  1116. in memory.
  1117. .. _dss_fixedarrays:
  1118. Fixed Size Arrays
  1119. ^^^^^^^^^^^^^^^^^
  1120. Fixed size arrays are very simple and very fast. They are good if you know
  1121. exactly how many elements you have, or you have a (low) upper bound on how many
  1122. you have.
  1123. .. _dss_heaparrays:
  1124. Heap Allocated Arrays
  1125. ^^^^^^^^^^^^^^^^^^^^^
  1126. Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good
  1127. if the number of elements is variable, if you know how many elements you will
  1128. need before the array is allocated, and if the array is usually large (if not,
  1129. consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated
  1130. array is the cost of the new/delete (aka malloc/free). Also note that if you
  1131. are allocating an array of a type with a constructor, the constructor and
  1132. destructors will be run for every element in the array (re-sizable vectors only
  1133. construct those elements actually used).
  1134. .. _dss_tinyptrvector:
  1135. llvm/ADT/TinyPtrVector.h
  1136. ^^^^^^^^^^^^^^^^^^^^^^^^
  1137. ``TinyPtrVector<Type>`` is a highly specialized collection class that is
  1138. optimized to avoid allocation in the case when a vector has zero or one
  1139. elements. It has two major restrictions: 1) it can only hold values of pointer
  1140. type, and 2) it cannot hold a null pointer.
  1141. Since this container is highly specialized, it is rarely used.
  1142. .. _dss_smallvector:
  1143. llvm/ADT/SmallVector.h
  1144. ^^^^^^^^^^^^^^^^^^^^^^
  1145. ``SmallVector<Type, N>`` is a simple class that looks and smells just like
  1146. ``vector<Type>``: it supports efficient iteration, lays out elements in memory
  1147. order (so you can do pointer arithmetic between elements), supports efficient
  1148. push_back/pop_back operations, supports efficient random access to its elements,
  1149. etc.
  1150. The main advantage of SmallVector is that it allocates space for some number of
  1151. elements (N) **in the object itself**. Because of this, if the SmallVector is
  1152. dynamically smaller than N, no malloc is performed. This can be a big win in
  1153. cases where the malloc/free call is far more expensive than the code that
  1154. fiddles around with the elements.
  1155. This is good for vectors that are "usually small" (e.g. the number of
  1156. predecessors/successors of a block is usually less than 8). On the other hand,
  1157. this makes the size of the SmallVector itself large, so you don't want to
  1158. allocate lots of them (doing so will waste a lot of space). As such,
  1159. SmallVectors are most useful when on the stack.
  1160. SmallVector also provides a nice portable and efficient replacement for
  1161. ``alloca``.
  1162. SmallVector has grown a few other minor advantages over std::vector, causing
  1163. ``SmallVector<Type, 0>`` to be preferred over ``std::vector<Type>``.
  1164. #. std::vector is exception-safe, and some implementations have pessimizations
  1165. that copy elements when SmallVector would move them.
  1166. #. SmallVector understands ``llvm::is_trivially_copyable<Type>`` and uses realloc aggressively.
  1167. #. Many LLVM APIs take a SmallVectorImpl as an out parameter (see the note
  1168. below).
  1169. #. SmallVector with N equal to 0 is smaller than std::vector on 64-bit
  1170. platforms, since it uses ``unsigned`` (instead of ``void*``) for its size
  1171. and capacity.
  1172. .. note::
  1173. Prefer to use ``SmallVectorImpl<T>`` as a parameter type.
  1174. In APIs that don't care about the "small size" (most?), prefer to use
  1175. the ``SmallVectorImpl<T>`` class, which is basically just the "vector
  1176. header" (and methods) without the elements allocated after it. Note that
  1177. ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the
  1178. conversion is implicit and costs nothing. E.g.
  1179. .. code-block:: c++
  1180. // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>.
  1181. hardcodedSmallSize(SmallVector<Foo, 2> &Out);
  1182. // GOOD: Clients can pass any SmallVector<Foo, N>.
  1183. allowsAnySmallSize(SmallVectorImpl<Foo> &Out);
  1184. void someFunc() {
  1185. SmallVector<Foo, 8> Vec;
  1186. hardcodedSmallSize(Vec); // Error.
  1187. allowsAnySmallSize(Vec); // Works.
  1188. }
  1189. Even though it has "``Impl``" in the name, this is so widely used that
  1190. it really isn't "private to the implementation" anymore. A name like
  1191. ``SmallVectorHeader`` would be more appropriate.
  1192. .. _dss_vector:
  1193. <vector>
  1194. ^^^^^^^^
  1195. ``std::vector<T>`` is well loved and respected. However, ``SmallVector<T, 0>``
  1196. is often a better option due to the advantages listed above. std::vector is
  1197. still useful when you need to store more than ``UINT32_MAX`` elements or when
  1198. interfacing with code that expects vectors :).
  1199. One worthwhile note about std::vector: avoid code like this:
  1200. .. code-block:: c++
  1201. for ( ... ) {
  1202. std::vector<foo> V;
  1203. // make use of V.
  1204. }
  1205. Instead, write this as:
  1206. .. code-block:: c++
  1207. std::vector<foo> V;
  1208. for ( ... ) {
  1209. // make use of V.
  1210. V.clear();
  1211. }
  1212. Doing so will save (at least) one heap allocation and free per iteration of the
  1213. loop.
  1214. .. _dss_deque:
  1215. <deque>
  1216. ^^^^^^^
  1217. ``std::deque`` is, in some senses, a generalized version of ``std::vector``.
  1218. Like ``std::vector``, it provides constant time random access and other similar
  1219. properties, but it also provides efficient access to the front of the list. It
  1220. does not guarantee continuity of elements within memory.
  1221. In exchange for this extra flexibility, ``std::deque`` has significantly higher
  1222. constant factor costs than ``std::vector``. If possible, use ``std::vector`` or
  1223. something cheaper.
  1224. .. _dss_list:
  1225. <list>
  1226. ^^^^^^
  1227. ``std::list`` is an extremely inefficient class that is rarely useful. It
  1228. performs a heap allocation for every element inserted into it, thus having an
  1229. extremely high constant factor, particularly for small data types.
  1230. ``std::list`` also only supports bidirectional iteration, not random access
  1231. iteration.
  1232. In exchange for this high cost, std::list supports efficient access to both ends
  1233. of the list (like ``std::deque``, but unlike ``std::vector`` or
  1234. ``SmallVector``). In addition, the iterator invalidation characteristics of
  1235. std::list are stronger than that of a vector class: inserting or removing an
  1236. element into the list does not invalidate iterator or pointers to other elements
  1237. in the list.
  1238. .. _dss_ilist:
  1239. llvm/ADT/ilist.h
  1240. ^^^^^^^^^^^^^^^^
  1241. ``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive,
  1242. because it requires the element to store and provide access to the prev/next
  1243. pointers for the list.
  1244. ``ilist`` has the same drawbacks as ``std::list``, and additionally requires an
  1245. ``ilist_traits`` implementation for the element type, but it provides some novel
  1246. characteristics. In particular, it can efficiently store polymorphic objects,
  1247. the traits class is informed when an element is inserted or removed from the
  1248. list, and ``ilist``\ s are guaranteed to support a constant-time splice
  1249. operation.
  1250. These properties are exactly what we want for things like ``Instruction``\ s and
  1251. basic blocks, which is why these are implemented with ``ilist``\ s.
  1252. Related classes of interest are explained in the following subsections:
  1253. * :ref:`ilist_traits <dss_ilist_traits>`
  1254. * :ref:`iplist <dss_iplist>`
  1255. * :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>`
  1256. * :ref:`Sentinels <dss_ilist_sentinel>`
  1257. .. _dss_packedvector:
  1258. llvm/ADT/PackedVector.h
  1259. ^^^^^^^^^^^^^^^^^^^^^^^
  1260. Useful for storing a vector of values using only a few number of bits for each
  1261. value. Apart from the standard operations of a vector-like container, it can
  1262. also perform an 'or' set operation.
  1263. For example:
  1264. .. code-block:: c++
  1265. enum State {
  1266. None = 0x0,
  1267. FirstCondition = 0x1,
  1268. SecondCondition = 0x2,
  1269. Both = 0x3
  1270. };
  1271. State get() {
  1272. PackedVector<State, 2> Vec1;
  1273. Vec1.push_back(FirstCondition);
  1274. PackedVector<State, 2> Vec2;
  1275. Vec2.push_back(SecondCondition);
  1276. Vec1 |= Vec2;
  1277. return Vec1[0]; // returns 'Both'.
  1278. }
  1279. .. _dss_ilist_traits:
  1280. ilist_traits
  1281. ^^^^^^^^^^^^
  1282. ``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>``
  1283. (and consequently ``ilist<T>``) publicly derive from this traits class.
  1284. .. _dss_iplist:
  1285. iplist
  1286. ^^^^^^
  1287. ``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower
  1288. interface. Notably, inserters from ``T&`` are absent.
  1289. ``ilist_traits<T>`` is a public base of this class and can be used for a wide
  1290. variety of customizations.
  1291. .. _dss_ilist_node:
  1292. llvm/ADT/ilist_node.h
  1293. ^^^^^^^^^^^^^^^^^^^^^
  1294. ``ilist_node<T>`` implements the forward and backward links that are expected
  1295. by the ``ilist<T>`` (and analogous containers) in the default manner.
  1296. ``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
  1297. ``T`` publicly derives from ``ilist_node<T>``.
  1298. .. _dss_ilist_sentinel:
  1299. Sentinels
  1300. ^^^^^^^^^
  1301. ``ilist``\ s have another specialty that must be considered. To be a good
  1302. citizen in the C++ ecosystem, it needs to support the standard container
  1303. operations, such as ``begin`` and ``end`` iterators, etc. Also, the
  1304. ``operator--`` must work correctly on the ``end`` iterator in the case of
  1305. non-empty ``ilist``\ s.
  1306. The only sensible solution to this problem is to allocate a so-called *sentinel*
  1307. along with the intrusive list, which serves as the ``end`` iterator, providing
  1308. the back-link to the last element. However conforming to the C++ convention it
  1309. is illegal to ``operator++`` beyond the sentinel and it also must not be
  1310. dereferenced.
  1311. These constraints allow for some implementation freedom to the ``ilist`` how to
  1312. allocate and store the sentinel. The corresponding policy is dictated by
  1313. ``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need
  1314. for a sentinel arises.
  1315. While the default policy is sufficient in most cases, it may break down when
  1316. ``T`` does not provide a default constructor. Also, in the case of many
  1317. instances of ``ilist``\ s, the memory overhead of the associated sentinels is
  1318. wasted. To alleviate the situation with numerous and voluminous
  1319. ``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*.
  1320. Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which
  1321. superpose the sentinel with the ``ilist`` instance in memory. Pointer
  1322. arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s
  1323. ``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves
  1324. as the back-link of the sentinel. This is the only field in the ghostly
  1325. sentinel which can be legally accessed.
  1326. .. _dss_other:
  1327. Other Sequential Container options
  1328. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1329. Other STL containers are available, such as ``std::string``.
  1330. There are also various STL adapter classes such as ``std::queue``,
  1331. ``std::priority_queue``, ``std::stack``, etc. These provide simplified access
  1332. to an underlying container but don't affect the cost of the container itself.
  1333. .. _ds_string:
  1334. String-like containers
  1335. ----------------------
  1336. There are a variety of ways to pass around and use strings in C and C++, and
  1337. LLVM adds a few new options to choose from. Pick the first option on this list
  1338. that will do what you need, they are ordered according to their relative cost.
  1339. Note that it is generally preferred to *not* pass strings around as ``const
  1340. char*``'s. These have a number of problems, including the fact that they
  1341. cannot represent embedded nul ("\0") characters, and do not have a length
  1342. available efficiently. The general replacement for '``const char*``' is
  1343. StringRef.
  1344. For more information on choosing string containers for APIs, please see
  1345. :ref:`Passing Strings <string_apis>`.
  1346. .. _dss_stringref:
  1347. llvm/ADT/StringRef.h
  1348. ^^^^^^^^^^^^^^^^^^^^
  1349. The StringRef class is a simple value class that contains a pointer to a
  1350. character and a length, and is quite related to the :ref:`ArrayRef
  1351. <dss_arrayref>` class (but specialized for arrays of characters). Because
  1352. StringRef carries a length with it, it safely handles strings with embedded nul
  1353. characters in it, getting the length does not require a strlen call, and it even
  1354. has very convenient APIs for slicing and dicing the character range that it
  1355. represents.
  1356. StringRef is ideal for passing simple strings around that are known to be live,
  1357. either because they are C string literals, std::string, a C array, or a
  1358. SmallVector. Each of these cases has an efficient implicit conversion to
  1359. StringRef, which doesn't result in a dynamic strlen being executed.
  1360. StringRef has a few major limitations which make more powerful string containers
  1361. useful:
  1362. #. You cannot directly convert a StringRef to a 'const char*' because there is
  1363. no way to add a trailing nul (unlike the .c_str() method on various stronger
  1364. classes).
  1365. #. StringRef doesn't own or keep alive the underlying string bytes.
  1366. As such it can easily lead to dangling pointers, and is not suitable for
  1367. embedding in datastructures in most cases (instead, use an std::string or
  1368. something like that).
  1369. #. For the same reason, StringRef cannot be used as the return value of a
  1370. method if the method "computes" the result string. Instead, use std::string.
  1371. #. StringRef's do not allow you to mutate the pointed-to string bytes and it
  1372. doesn't allow you to insert or remove bytes from the range. For editing
  1373. operations like this, it interoperates with the :ref:`Twine <dss_twine>`
  1374. class.
  1375. Because of its strengths and limitations, it is very common for a function to
  1376. take a StringRef and for a method on an object to return a StringRef that points
  1377. into some string that it owns.
  1378. .. _dss_twine:
  1379. llvm/ADT/Twine.h
  1380. ^^^^^^^^^^^^^^^^
  1381. The Twine class is used as an intermediary datatype for APIs that want to take a
  1382. string that can be constructed inline with a series of concatenations. Twine
  1383. works by forming recursive instances of the Twine datatype (a simple value
  1384. object) on the stack as temporary objects, linking them together into a tree
  1385. which is then linearized when the Twine is consumed. Twine is only safe to use
  1386. as the argument to a function, and should always be a const reference, e.g.:
  1387. .. code-block:: c++
  1388. void foo(const Twine &T);
  1389. ...
  1390. StringRef X = ...
  1391. unsigned i = ...
  1392. foo(X + "." + Twine(i));
  1393. This example forms a string like "blarg.42" by concatenating the values
  1394. together, and does not form intermediate strings containing "blarg" or "blarg.".
  1395. Because Twine is constructed with temporary objects on the stack, and because
  1396. these instances are destroyed at the end of the current statement, it is an
  1397. inherently dangerous API. For example, this simple variant contains undefined
  1398. behavior and will probably crash:
  1399. .. code-block:: c++
  1400. void foo(const Twine &T);
  1401. ...
  1402. StringRef X = ...
  1403. unsigned i = ...
  1404. const Twine &Tmp = X + "." + Twine(i);
  1405. foo(Tmp);
  1406. ... because the temporaries are destroyed before the call. That said, Twine's
  1407. are much more efficient than intermediate std::string temporaries, and they work
  1408. really well with StringRef. Just be aware of their limitations.
  1409. .. _dss_smallstring:
  1410. llvm/ADT/SmallString.h
  1411. ^^^^^^^^^^^^^^^^^^^^^^
  1412. SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some
  1413. convenience APIs like += that takes StringRef's. SmallString avoids allocating
  1414. memory in the case when the preallocated space is enough to hold its data, and
  1415. it calls back to general heap allocation when required. Since it owns its data,
  1416. it is very safe to use and supports full mutation of the string.
  1417. Like SmallVector's, the big downside to SmallString is their sizeof. While they
  1418. are optimized for small strings, they themselves are not particularly small.
  1419. This means that they work great for temporary scratch buffers on the stack, but
  1420. should not generally be put into the heap: it is very rare to see a SmallString
  1421. as the member of a frequently-allocated heap data structure or returned
  1422. by-value.
  1423. .. _dss_stdstring:
  1424. std::string
  1425. ^^^^^^^^^^^
  1426. The standard C++ std::string class is a very general class that (like
  1427. SmallString) owns its underlying data. sizeof(std::string) is very reasonable
  1428. so it can be embedded into heap data structures and returned by-value. On the
  1429. other hand, std::string is highly inefficient for inline editing (e.g.
  1430. concatenating a bunch of stuff together) and because it is provided by the
  1431. standard library, its performance characteristics depend a lot of the host
  1432. standard library (e.g. libc++ and MSVC provide a highly optimized string class,
  1433. GCC contains a really slow implementation).
  1434. The major disadvantage of std::string is that almost every operation that makes
  1435. them larger can allocate memory, which is slow. As such, it is better to use
  1436. SmallVector or Twine as a scratch buffer, but then use std::string to persist
  1437. the result.
  1438. .. _ds_set:
  1439. Set-Like Containers (std::set, SmallSet, SetVector, etc)
  1440. --------------------------------------------------------
  1441. Set-like containers are useful when you need to canonicalize multiple values
  1442. into a single representation. There are several different choices for how to do
  1443. this, providing various trade-offs.
  1444. .. _dss_sortedvectorset:
  1445. A sorted 'vector'
  1446. ^^^^^^^^^^^^^^^^^
  1447. If you intend to insert a lot of elements, then do a lot of queries, a great
  1448. approach is to use an std::vector (or other sequential container) with
  1449. std::sort+std::unique to remove duplicates. This approach works really well if
  1450. your usage pattern has these two distinct phases (insert then query), and can be
  1451. coupled with a good choice of :ref:`sequential container <ds_sequential>`.
  1452. This combination provides the several nice properties: the result data is
  1453. contiguous in memory (good for cache locality), has few allocations, is easy to
  1454. address (iterators in the final vector are just indices or pointers), and can be
  1455. efficiently queried with a standard binary search (e.g.
  1456. ``std::lower_bound``; if you want the whole range of elements comparing
  1457. equal, use ``std::equal_range``).
  1458. .. _dss_smallset:
  1459. llvm/ADT/SmallSet.h
  1460. ^^^^^^^^^^^^^^^^^^^
  1461. If you have a set-like data structure that is usually small and whose elements
  1462. are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has
  1463. space for N elements in place (thus, if the set is dynamically smaller than N,
  1464. no malloc traffic is required) and accesses them with a simple linear search.
  1465. When the set grows beyond N elements, it allocates a more expensive
  1466. representation that guarantees efficient access (for most types, it falls back
  1467. to :ref:`std::set <dss_set>`, but for pointers it uses something far better,
  1468. :ref:`SmallPtrSet <dss_smallptrset>`.
  1469. The magic of this class is that it handles small sets extremely efficiently, but
  1470. gracefully handles extremely large sets without loss of efficiency.
  1471. .. _dss_smallptrset:
  1472. llvm/ADT/SmallPtrSet.h
  1473. ^^^^^^^^^^^^^^^^^^^^^^
  1474. ``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of
  1475. pointers is transparently implemented with a ``SmallPtrSet``). If more than N
  1476. insertions are performed, a single quadratically probed hash table is allocated
  1477. and grows as needed, providing extremely efficient access (constant time
  1478. insertion/deleting/queries with low constant factors) and is very stingy with
  1479. malloc traffic.
  1480. Note that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet``
  1481. are invalidated whenever an insertion occurs. Also, the values visited by the
  1482. iterators are not visited in sorted order.
  1483. .. _dss_stringset:
  1484. llvm/ADT/StringSet.h
  1485. ^^^^^^^^^^^^^^^^^^^^
  1486. ``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`,
  1487. and it allows efficient storage and retrieval of unique strings.
  1488. Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports
  1489. iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you
  1490. need to call ``i->getKey()`` to access the item of the StringSet.) On the
  1491. other hand, ``StringSet`` doesn't support range-insertion and
  1492. copy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet
  1493. <dss_smallptrset>` do support.
  1494. .. _dss_denseset:
  1495. llvm/ADT/DenseSet.h
  1496. ^^^^^^^^^^^^^^^^^^^
  1497. DenseSet is a simple quadratically probed hash table. It excels at supporting
  1498. small values: it uses a single allocation to hold all of the pairs that are
  1499. currently inserted in the set. DenseSet is a great way to unique small values
  1500. that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for
  1501. pointers). Note that DenseSet has the same requirements for the value type that
  1502. :ref:`DenseMap <dss_densemap>` has.
  1503. .. _dss_sparseset:
  1504. llvm/ADT/SparseSet.h
  1505. ^^^^^^^^^^^^^^^^^^^^
  1506. SparseSet holds a small number of objects identified by unsigned keys of
  1507. moderate size. It uses a lot of memory, but provides operations that are almost
  1508. as fast as a vector. Typical keys are physical registers, virtual registers, or
  1509. numbered basic blocks.
  1510. SparseSet is useful for algorithms that need very fast clear/find/insert/erase
  1511. and fast iteration over small sets. It is not intended for building composite
  1512. data structures.
  1513. .. _dss_sparsemultiset:
  1514. llvm/ADT/SparseMultiSet.h
  1515. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1516. SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's
  1517. desirable attributes. Like SparseSet, it typically uses a lot of memory, but
  1518. provides operations that are almost as fast as a vector. Typical keys are
  1519. physical registers, virtual registers, or numbered basic blocks.
  1520. SparseMultiSet is useful for algorithms that need very fast
  1521. clear/find/insert/erase of the entire collection, and iteration over sets of
  1522. elements sharing a key. It is often a more efficient choice than using composite
  1523. data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for
  1524. building composite data structures.
  1525. .. _dss_FoldingSet:
  1526. llvm/ADT/FoldingSet.h
  1527. ^^^^^^^^^^^^^^^^^^^^^
  1528. FoldingSet is an aggregate class that is really good at uniquing
  1529. expensive-to-create or polymorphic objects. It is a combination of a chained
  1530. hash table with intrusive links (uniqued objects are required to inherit from
  1531. FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID
  1532. process.
  1533. Consider a case where you want to implement a "getOrCreateFoo" method for a
  1534. complex object (for example, a node in the code generator). The client has a
  1535. description of **what** it wants to generate (it knows the opcode and all the
  1536. operands), but we don't want to 'new' a node, then try inserting it into a set
  1537. only to find out it already exists, at which point we would have to delete it
  1538. and return the node that already exists.
  1539. To support this style of client, FoldingSet perform a query with a
  1540. FoldingSetNodeID (which wraps SmallVector) that can be used to describe the
  1541. element that we want to query for. The query either returns the element
  1542. matching the ID or it returns an opaque ID that indicates where insertion should
  1543. take place. Construction of the ID usually does not require heap traffic.
  1544. Because FoldingSet uses intrusive links, it can support polymorphic objects in
  1545. the set (for example, you can have SDNode instances mixed with LoadSDNodes).
  1546. Because the elements are individually allocated, pointers to the elements are
  1547. stable: inserting or removing elements does not invalidate any pointers to other
  1548. elements.
  1549. .. _dss_set:
  1550. <set>
  1551. ^^^^^
  1552. ``std::set`` is a reasonable all-around set class, which is decent at many
  1553. things but great at nothing. std::set allocates memory for each element
  1554. inserted (thus it is very malloc intensive) and typically stores three pointers
  1555. per element in the set (thus adding a large amount of per-element space
  1556. overhead). It offers guaranteed log(n) performance, which is not particularly
  1557. fast from a complexity standpoint (particularly if the elements of the set are
  1558. expensive to compare, like strings), and has extremely high constant factors for
  1559. lookup, insertion and removal.
  1560. The advantages of std::set are that its iterators are stable (deleting or
  1561. inserting an element from the set does not affect iterators or pointers to other
  1562. elements) and that iteration over the set is guaranteed to be in sorted order.
  1563. If the elements in the set are large, then the relative overhead of the pointers
  1564. and malloc traffic is not a big deal, but if the elements of the set are small,
  1565. std::set is almost never a good choice.
  1566. .. _dss_setvector:
  1567. llvm/ADT/SetVector.h
  1568. ^^^^^^^^^^^^^^^^^^^^
  1569. LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a
  1570. set-like container along with a :ref:`Sequential Container <ds_sequential>` The
  1571. important property that this provides is efficient insertion with uniquing
  1572. (duplicate elements are ignored) with iteration support. It implements this by
  1573. inserting elements into both a set-like container and the sequential container,
  1574. using the set-like container for uniquing and the sequential container for
  1575. iteration.
  1576. The difference between SetVector and other sets is that the order of iteration
  1577. is guaranteed to match the order of insertion into the SetVector. This property
  1578. is really important for things like sets of pointers. Because pointer values
  1579. are non-deterministic (e.g. vary across runs of the program on different
  1580. machines), iterating over the pointers in the set will not be in a well-defined
  1581. order.
  1582. The drawback of SetVector is that it requires twice as much space as a normal
  1583. set and has the sum of constant factors from the set-like container and the
  1584. sequential container that it uses. Use it **only** if you need to iterate over
  1585. the elements in a deterministic order. SetVector is also expensive to delete
  1586. elements out of (linear time), unless you use its "pop_back" method, which is
  1587. faster.
  1588. ``SetVector`` is an adapter class that defaults to using ``std::vector`` and a
  1589. size 16 ``SmallSet`` for the underlying containers, so it is quite expensive.
  1590. However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class,
  1591. which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size.
  1592. If you use this, and if your sets are dynamically smaller than ``N``, you will
  1593. save a lot of heap traffic.
  1594. .. _dss_uniquevector:
  1595. llvm/ADT/UniqueVector.h
  1596. ^^^^^^^^^^^^^^^^^^^^^^^
  1597. UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a
  1598. unique ID for each element inserted into the set. It internally contains a map
  1599. and a vector, and it assigns a unique ID for each value inserted into the set.
  1600. UniqueVector is very expensive: its cost is the sum of the cost of maintaining
  1601. both the map and vector, it has high complexity, high constant factors, and
  1602. produces a lot of malloc traffic. It should be avoided.
  1603. .. _dss_immutableset:
  1604. llvm/ADT/ImmutableSet.h
  1605. ^^^^^^^^^^^^^^^^^^^^^^^
  1606. ImmutableSet is an immutable (functional) set implementation based on an AVL
  1607. tree. Adding or removing elements is done through a Factory object and results
  1608. in the creation of a new ImmutableSet object. If an ImmutableSet already exists
  1609. with the given contents, then the existing one is returned; equality is compared
  1610. with a FoldingSetNodeID. The time and space complexity of add or remove
  1611. operations is logarithmic in the size of the original set.
  1612. There is no method for returning an element of the set, you can only check for
  1613. membership.
  1614. .. _dss_otherset:
  1615. Other Set-Like Container Options
  1616. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1617. The STL provides several other options, such as std::multiset and the various
  1618. "hash_set" like containers (whether from C++ TR1 or from the SGI library). We
  1619. never use hash_set and unordered_set because they are generally very expensive
  1620. (each insertion requires a malloc) and very non-portable.
  1621. std::multiset is useful if you're not interested in elimination of duplicates,
  1622. but has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector
  1623. (where you don't delete duplicate entries) or some other approach is almost
  1624. always better.
  1625. .. _ds_map:
  1626. Map-Like Containers (std::map, DenseMap, etc)
  1627. ---------------------------------------------
  1628. Map-like containers are useful when you want to associate data to a key. As
  1629. usual, there are a lot of different ways to do this. :)
  1630. .. _dss_sortedvectormap:
  1631. A sorted 'vector'
  1632. ^^^^^^^^^^^^^^^^^
  1633. If your usage pattern follows a strict insert-then-query approach, you can
  1634. trivially use the same approach as :ref:`sorted vectors for set-like containers
  1635. <dss_sortedvectorset>`. The only difference is that your query function (which
  1636. uses std::lower_bound to get efficient log(n) lookup) should only compare the
  1637. key, not both the key and value. This yields the same advantages as sorted
  1638. vectors for sets.
  1639. .. _dss_stringmap:
  1640. llvm/ADT/StringMap.h
  1641. ^^^^^^^^^^^^^^^^^^^^
  1642. Strings are commonly used as keys in maps, and they are difficult to support
  1643. efficiently: they are variable length, inefficient to hash and compare when
  1644. long, expensive to copy, etc. StringMap is a specialized container designed to
  1645. cope with these issues. It supports mapping an arbitrary range of bytes to an
  1646. arbitrary other object.
  1647. The StringMap implementation uses a quadratically-probed hash table, where the
  1648. buckets store a pointer to the heap allocated entries (and some other stuff).
  1649. The entries in the map must be heap allocated because the strings are variable
  1650. length. The string data (key) and the element object (value) are stored in the
  1651. same allocation with the string data immediately after the element object.
  1652. This container guarantees the "``(char*)(&Value+1)``" points to the key string
  1653. for a value.
  1654. The StringMap is very fast for several reasons: quadratic probing is very cache
  1655. efficient for lookups, the hash value of strings in buckets is not recomputed
  1656. when looking up an element, StringMap rarely has to touch the memory for
  1657. unrelated objects when looking up a value (even when hash collisions happen),
  1658. hash table growth does not recompute the hash values for strings already in the
  1659. table, and each pair in the map is store in a single allocation (the string data
  1660. is stored in the same allocation as the Value of a pair).
  1661. StringMap also provides query methods that take byte ranges, so it only ever
  1662. copies a string if a value is inserted into the table.
  1663. StringMap iteration order, however, is not guaranteed to be deterministic, so
  1664. any uses which require that should instead use a std::map.
  1665. .. _dss_indexmap:
  1666. llvm/ADT/IndexedMap.h
  1667. ^^^^^^^^^^^^^^^^^^^^^
  1668. IndexedMap is a specialized container for mapping small dense integers (or
  1669. values that can be mapped to small dense integers) to some other type. It is
  1670. internally implemented as a vector with a mapping function that maps the keys
  1671. to the dense integer range.
  1672. This is useful for cases like virtual registers in the LLVM code generator: they
  1673. have a dense mapping that is offset by a compile-time constant (the first
  1674. virtual register ID).
  1675. .. _dss_densemap:
  1676. llvm/ADT/DenseMap.h
  1677. ^^^^^^^^^^^^^^^^^^^
  1678. DenseMap is a simple quadratically probed hash table. It excels at supporting
  1679. small keys and values: it uses a single allocation to hold all of the pairs
  1680. that are currently inserted in the map. DenseMap is a great way to map
  1681. pointers to pointers, or map other small types to each other.
  1682. There are several aspects of DenseMap that you should be aware of, however.
  1683. The iterators in a DenseMap are invalidated whenever an insertion occurs,
  1684. unlike map. Also, because DenseMap allocates space for a large number of
  1685. key/value pairs (it starts with 64 by default), it will waste a lot of space if
  1686. your keys or values are large. Finally, you must implement a partial
  1687. specialization of DenseMapInfo for the key that you want, if it isn't already
  1688. supported. This is required to tell DenseMap about two special marker values
  1689. (which can never be inserted into the map) that it needs internally.
  1690. DenseMap's find_as() method supports lookup operations using an alternate key
  1691. type. This is useful in cases where the normal key type is expensive to
  1692. construct, but cheap to compare against. The DenseMapInfo is responsible for
  1693. defining the appropriate comparison and hashing methods for each alternate key
  1694. type used.
  1695. .. _dss_valuemap:
  1696. llvm/IR/ValueMap.h
  1697. ^^^^^^^^^^^^^^^^^^^
  1698. ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping
  1699. ``Value*``\ s (or subclasses) to another type. When a Value is deleted or
  1700. RAUW'ed, ValueMap will update itself so the new version of the key is mapped to
  1701. the same value, just as if the key were a WeakVH. You can configure exactly how
  1702. this happens, and what else happens on these two events, by passing a ``Config``
  1703. parameter to the ValueMap template.
  1704. .. _dss_intervalmap:
  1705. llvm/ADT/IntervalMap.h
  1706. ^^^^^^^^^^^^^^^^^^^^^^
  1707. IntervalMap is a compact map for small keys and values. It maps key intervals
  1708. instead of single keys, and it will automatically coalesce adjacent intervals.
  1709. When the map only contains a few intervals, they are stored in the map object
  1710. itself to avoid allocations.
  1711. The IntervalMap iterators are quite big, so they should not be passed around as
  1712. STL iterators. The heavyweight iterators allow a smaller data structure.
  1713. .. _dss_map:
  1714. <map>
  1715. ^^^^^
  1716. std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a
  1717. single allocation per pair inserted into the map, it offers log(n) lookup with
  1718. an extremely large constant factor, imposes a space penalty of 3 pointers per
  1719. pair in the map, etc.
  1720. std::map is most useful when your keys or values are very large, if you need to
  1721. iterate over the collection in sorted order, or if you need stable iterators
  1722. into the map (i.e. they don't get invalidated if an insertion or deletion of
  1723. another element takes place).
  1724. .. _dss_mapvector:
  1725. llvm/ADT/MapVector.h
  1726. ^^^^^^^^^^^^^^^^^^^^
  1727. ``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The
  1728. main difference is that the iteration order is guaranteed to be the insertion
  1729. order, making it an easy (but somewhat expensive) solution for non-deterministic
  1730. iteration over maps of pointers.
  1731. It is implemented by mapping from key to an index in a vector of key,value
  1732. pairs. This provides fast lookup and iteration, but has two main drawbacks:
  1733. the key is stored twice and removing elements takes linear time. If it is
  1734. necessary to remove elements, it's best to remove them in bulk using
  1735. ``remove_if()``.
  1736. .. _dss_inteqclasses:
  1737. llvm/ADT/IntEqClasses.h
  1738. ^^^^^^^^^^^^^^^^^^^^^^^
  1739. IntEqClasses provides a compact representation of equivalence classes of small
  1740. integers. Initially, each integer in the range 0..n-1 has its own equivalence
  1741. class. Classes can be joined by passing two class representatives to the
  1742. join(a, b) method. Two integers are in the same class when findLeader() returns
  1743. the same representative.
  1744. Once all equivalence classes are formed, the map can be compressed so each
  1745. integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m
  1746. is the total number of equivalence classes. The map must be uncompressed before
  1747. it can be edited again.
  1748. .. _dss_immutablemap:
  1749. llvm/ADT/ImmutableMap.h
  1750. ^^^^^^^^^^^^^^^^^^^^^^^
  1751. ImmutableMap is an immutable (functional) map implementation based on an AVL
  1752. tree. Adding or removing elements is done through a Factory object and results
  1753. in the creation of a new ImmutableMap object. If an ImmutableMap already exists
  1754. with the given key set, then the existing one is returned; equality is compared
  1755. with a FoldingSetNodeID. The time and space complexity of add or remove
  1756. operations is logarithmic in the size of the original map.
  1757. .. _dss_othermap:
  1758. Other Map-Like Container Options
  1759. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1760. The STL provides several other options, such as std::multimap and the various
  1761. "hash_map" like containers (whether from C++ TR1 or from the SGI library). We
  1762. never use hash_set and unordered_set because they are generally very expensive
  1763. (each insertion requires a malloc) and very non-portable.
  1764. std::multimap is useful if you want to map a key to multiple values, but has all
  1765. the drawbacks of std::map. A sorted vector or some other approach is almost
  1766. always better.
  1767. .. _ds_bit:
  1768. Bit storage containers (BitVector, SparseBitVector)
  1769. ---------------------------------------------------
  1770. Unlike the other containers, there are only two bit storage containers, and
  1771. choosing when to use each is relatively straightforward.
  1772. One additional option is ``std::vector<bool>``: we discourage its use for two
  1773. reasons 1) the implementation in many common compilers (e.g. commonly
  1774. available versions of GCC) is extremely inefficient and 2) the C++ standards
  1775. committee is likely to deprecate this container and/or change it significantly
  1776. somehow. In any case, please don't use it.
  1777. .. _dss_bitvector:
  1778. BitVector
  1779. ^^^^^^^^^
  1780. The BitVector container provides a dynamic size set of bits for manipulation.
  1781. It supports individual bit setting/testing, as well as set operations. The set
  1782. operations take time O(size of bitvector), but operations are performed one word
  1783. at a time, instead of one bit at a time. This makes the BitVector very fast for
  1784. set operations compared to other containers. Use the BitVector when you expect
  1785. the number of set bits to be high (i.e. a dense set).
  1786. .. _dss_smallbitvector:
  1787. SmallBitVector
  1788. ^^^^^^^^^^^^^^
  1789. The SmallBitVector container provides the same interface as BitVector, but it is
  1790. optimized for the case where only a small number of bits, less than 25 or so,
  1791. are needed. It also transparently supports larger bit counts, but slightly less
  1792. efficiently than a plain BitVector, so SmallBitVector should only be used when
  1793. larger counts are rare.
  1794. At this time, SmallBitVector does not support set operations (and, or, xor), and
  1795. its operator[] does not provide an assignable lvalue.
  1796. .. _dss_sparsebitvector:
  1797. SparseBitVector
  1798. ^^^^^^^^^^^^^^^
  1799. The SparseBitVector container is much like BitVector, with one major difference:
  1800. Only the bits that are set, are stored. This makes the SparseBitVector much
  1801. more space efficient than BitVector when the set is sparse, as well as making
  1802. set operations O(number of set bits) instead of O(size of universe). The
  1803. downside to the SparseBitVector is that setting and testing of random bits is
  1804. O(N), and on large SparseBitVectors, this can be slower than BitVector. In our
  1805. implementation, setting or testing bits in sorted order (either forwards or
  1806. reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends
  1807. on size) of the current bit is also O(1). As a general statement,
  1808. testing/setting bits in a SparseBitVector is O(distance away from last set bit).
  1809. .. _debugging:
  1810. Debugging
  1811. =========
  1812. A handful of `GDB pretty printers
  1813. <https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are
  1814. provided for some of the core LLVM libraries. To use them, execute the
  1815. following (or add it to your ``~/.gdbinit``)::
  1816. source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py
  1817. It also might be handy to enable the `print pretty
  1818. <http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to
  1819. avoid data structures being printed as a big block of text.
  1820. .. _common:
  1821. Helpful Hints for Common Operations
  1822. ===================================
  1823. This section describes how to perform some very simple transformations of LLVM
  1824. code. This is meant to give examples of common idioms used, showing the
  1825. practical side of LLVM transformations.
  1826. Because this is a "how-to" section, you should also read about the main classes
  1827. that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference
  1828. <coreclasses>` contains details and descriptions of the main classes that you
  1829. should know about.
  1830. .. _inspection:
  1831. Basic Inspection and Traversal Routines
  1832. ---------------------------------------
  1833. The LLVM compiler infrastructure have many different data structures that may be
  1834. traversed. Following the example of the C++ standard template library, the
  1835. techniques used to traverse these various data structures are all basically the
  1836. same. For a enumerable sequence of values, the ``XXXbegin()`` function (or
  1837. method) returns an iterator to the start of the sequence, the ``XXXend()``
  1838. function returns an iterator pointing to one past the last valid element of the
  1839. sequence, and there is some ``XXXiterator`` data type that is common between the
  1840. two operations.
  1841. Because the pattern for iteration is common across many different aspects of the
  1842. program representation, the standard template library algorithms may be used on
  1843. them, and it is easier to remember how to iterate. First we show a few common
  1844. examples of the data structures that need to be traversed. Other data
  1845. structures are traversed in very similar ways.
  1846. .. _iterate_function:
  1847. Iterating over the ``BasicBlock`` in a ``Function``
  1848. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1849. It's quite common to have a ``Function`` instance that you'd like to transform
  1850. in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To
  1851. facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that
  1852. constitute the ``Function``. The following is an example that prints the name
  1853. of a ``BasicBlock`` and the number of ``Instruction``\ s it contains:
  1854. .. code-block:: c++
  1855. Function &Func = ...
  1856. for (BasicBlock &BB : Func)
  1857. // Print out the name of the basic block if it has one, and then the
  1858. // number of instructions that it contains
  1859. errs() << "Basic block (name=" << BB.getName() << ") has "
  1860. << BB.size() << " instructions.\n";
  1861. .. _iterate_basicblock:
  1862. Iterating over the ``Instruction`` in a ``BasicBlock``
  1863. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1864. Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to
  1865. iterate over the individual instructions that make up ``BasicBlock``\ s. Here's
  1866. a code snippet that prints out each instruction in a ``BasicBlock``:
  1867. .. code-block:: c++
  1868. BasicBlock& BB = ...
  1869. for (Instruction &I : BB)
  1870. // The next statement works since operator<<(ostream&,...)
  1871. // is overloaded for Instruction&
  1872. errs() << I << "\n";
  1873. However, this isn't really the best way to print out the contents of a
  1874. ``BasicBlock``! Since the ostream operators are overloaded for virtually
  1875. anything you'll care about, you could have just invoked the print routine on the
  1876. basic block itself: ``errs() << BB << "\n";``.
  1877. .. _iterate_insiter:
  1878. Iterating over the ``Instruction`` in a ``Function``
  1879. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1880. If you're finding that you commonly iterate over a ``Function``'s
  1881. ``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s,
  1882. ``InstIterator`` should be used instead. You'll need to include
  1883. ``llvm/IR/InstIterator.h`` (`doxygen
  1884. <http://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate
  1885. ``InstIterator``\ s explicitly in your code. Here's a small example that shows
  1886. how to dump all instructions in a function to the standard error stream:
  1887. .. code-block:: c++
  1888. #include "llvm/IR/InstIterator.h"
  1889. // F is a pointer to a Function instance
  1890. for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
  1891. errs() << *I << "\n";
  1892. Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with
  1893. its initial contents. For example, if you wanted to initialize a work list to
  1894. contain all instructions in a ``Function`` F, all you would need to do is
  1895. something like:
  1896. .. code-block:: c++
  1897. std::set<Instruction*> worklist;
  1898. // or better yet, SmallPtrSet<Instruction*, 64> worklist;
  1899. for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
  1900. worklist.insert(&*I);
  1901. The STL set ``worklist`` would now contain all instructions in the ``Function``
  1902. pointed to by F.
  1903. .. _iterate_convert:
  1904. Turning an iterator into a class pointer (and vice-versa)
  1905. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1906. Sometimes, it'll be useful to grab a reference (or pointer) to a class instance
  1907. when all you've got at hand is an iterator. Well, extracting a reference or a
  1908. pointer from an iterator is very straight-forward. Assuming that ``i`` is a
  1909. ``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``:
  1910. .. code-block:: c++
  1911. Instruction& inst = *i; // Grab reference to instruction reference
  1912. Instruction* pinst = &*i; // Grab pointer to instruction reference
  1913. const Instruction& inst = *j;
  1914. However, the iterators you'll be working with in the LLVM framework are special:
  1915. they will automatically convert to a ptr-to-instance type whenever they need to.
  1916. Instead of dereferencing the iterator and then taking the address of the result,
  1917. you can simply assign the iterator to the proper pointer type and you get the
  1918. dereference and address-of operation as a result of the assignment (behind the
  1919. scenes, this is a result of overloading casting mechanisms). Thus the second
  1920. line of the last example,
  1921. .. code-block:: c++
  1922. Instruction *pinst = &*i;
  1923. is semantically equivalent to
  1924. .. code-block:: c++
  1925. Instruction *pinst = i;
  1926. It's also possible to turn a class pointer into the corresponding iterator, and
  1927. this is a constant time operation (very efficient). The following code snippet
  1928. illustrates use of the conversion constructors provided by LLVM iterators. By
  1929. using these, you can explicitly grab the iterator of something without actually
  1930. obtaining it via iteration over some structure:
  1931. .. code-block:: c++
  1932. void printNextInstruction(Instruction* inst) {
  1933. BasicBlock::iterator it(inst);
  1934. ++it; // After this line, it refers to the instruction after *inst
  1935. if (it != inst->getParent()->end()) errs() << *it << "\n";
  1936. }
  1937. Unfortunately, these implicit conversions come at a cost; they prevent these
  1938. iterators from conforming to standard iterator conventions, and thus from being
  1939. usable with standard algorithms and containers. For example, they prevent the
  1940. following code, where ``B`` is a ``BasicBlock``, from compiling:
  1941. .. code-block:: c++
  1942. llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
  1943. Because of this, these implicit conversions may be removed some day, and
  1944. ``operator*`` changed to return a pointer instead of a reference.
  1945. .. _iterate_complex:
  1946. Finding call sites: a slightly more complex example
  1947. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1948. Say that you're writing a FunctionPass and would like to count all the locations
  1949. in the entire module (that is, across every ``Function``) where a certain
  1950. function (i.e., some ``Function *``) is already in scope. As you'll learn
  1951. later, you may want to use an ``InstVisitor`` to accomplish this in a much more
  1952. straight-forward manner, but this example will allow us to explore how you'd do
  1953. it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we
  1954. want to do:
  1955. .. code-block:: none
  1956. initialize callCounter to zero
  1957. for each Function f in the Module
  1958. for each BasicBlock b in f
  1959. for each Instruction i in b
  1960. if (i is a CallInst and calls the given function)
  1961. increment callCounter
  1962. And the actual code is (remember, because we're writing a ``FunctionPass``, our
  1963. ``FunctionPass``-derived class simply has to override the ``runOnFunction``
  1964. method):
  1965. .. code-block:: c++
  1966. Function* targetFunc = ...;
  1967. class OurFunctionPass : public FunctionPass {
  1968. public:
  1969. OurFunctionPass(): callCounter(0) { }
  1970. virtual runOnFunction(Function& F) {
  1971. for (BasicBlock &B : F) {
  1972. for (Instruction &I: B) {
  1973. if (auto *CallInst = dyn_cast<CallInst>(&I)) {
  1974. // We know we've encountered a call instruction, so we
  1975. // need to determine if it's a call to the
  1976. // function pointed to by m_func or not.
  1977. if (CallInst->getCalledFunction() == targetFunc)
  1978. ++callCounter;
  1979. }
  1980. }
  1981. }
  1982. }
  1983. private:
  1984. unsigned callCounter;
  1985. };
  1986. .. _calls_and_invokes:
  1987. Treating calls and invokes the same way
  1988. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1989. You may have noticed that the previous example was a bit oversimplified in that
  1990. it did not deal with call sites generated by 'invoke' instructions. In this,
  1991. and in other situations, you may find that you want to treat ``CallInst``\ s and
  1992. ``InvokeInst``\ s the same way, even though their most-specific common base
  1993. class is ``Instruction``, which includes lots of less closely-related things.
  1994. For these cases, LLVM provides a handy wrapper class called ``CallSite``
  1995. (`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is
  1996. essentially a wrapper around an ``Instruction`` pointer, with some methods that
  1997. provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s.
  1998. This class has "value semantics": it should be passed by value, not by reference
  1999. and it should not be dynamically allocated or deallocated using ``operator new``
  2000. or ``operator delete``. It is efficiently copyable, assignable and
  2001. constructable, with costs equivalents to that of a bare pointer. If you look at
  2002. its definition, it has only a single pointer member.
  2003. .. _iterate_chains:
  2004. Iterating over def-use & use-def chains
  2005. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2006. Frequently, we might have an instance of the ``Value`` class (`doxygen
  2007. <http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine
  2008. which ``User``\ s use the ``Value``. The list of all ``User``\ s of a particular
  2009. ``Value`` is called a *def-use* chain. For example, let's say we have a
  2010. ``Function*`` named ``F`` to a particular function ``foo``. Finding all of the
  2011. instructions that *use* ``foo`` is as simple as iterating over the *def-use*
  2012. chain of ``F``:
  2013. .. code-block:: c++
  2014. Function *F = ...;
  2015. for (User *U : F->users()) {
  2016. if (Instruction *Inst = dyn_cast<Instruction>(U)) {
  2017. errs() << "F is used in instruction:\n";
  2018. errs() << *Inst << "\n";
  2019. }
  2020. Alternatively, it's common to have an instance of the ``User`` Class (`doxygen
  2021. <http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what
  2022. ``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is
  2023. known as a *use-def* chain. Instances of class ``Instruction`` are common
  2024. ``User`` s, so we might want to iterate over all of the values that a particular
  2025. instruction uses (that is, the operands of the particular ``Instruction``):
  2026. .. code-block:: c++
  2027. Instruction *pi = ...;
  2028. for (Use &U : pi->operands()) {
  2029. Value *v = U.get();
  2030. // ...
  2031. }
  2032. Declaring objects as ``const`` is an important tool of enforcing mutation free
  2033. algorithms (such as analyses, etc.). For this purpose above iterators come in
  2034. constant flavors as ``Value::const_use_iterator`` and
  2035. ``Value::const_op_iterator``. They automatically arise when calling
  2036. ``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively.
  2037. Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns
  2038. remain unchanged.
  2039. .. _iterate_preds:
  2040. Iterating over predecessors & successors of blocks
  2041. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2042. Iterating over the predecessors and successors of a block is quite easy with the
  2043. routines defined in ``"llvm/IR/CFG.h"``. Just use code like this to
  2044. iterate over all predecessors of BB:
  2045. .. code-block:: c++
  2046. #include "llvm/IR/CFG.h"
  2047. BasicBlock *BB = ...;
  2048. for (BasicBlock *Pred : predecessors(BB)) {
  2049. // ...
  2050. }
  2051. Similarly, to iterate over successors use ``successors``.
  2052. .. _simplechanges:
  2053. Making simple changes
  2054. ---------------------
  2055. There are some primitive transformation operations present in the LLVM
  2056. infrastructure that are worth knowing about. When performing transformations,
  2057. it's fairly common to manipulate the contents of basic blocks. This section
  2058. describes some of the common methods for doing so and gives example code.
  2059. .. _schanges_creating:
  2060. Creating and inserting new ``Instruction``\ s
  2061. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2062. *Instantiating Instructions*
  2063. Creation of ``Instruction``\ s is straight-forward: simply call the constructor
  2064. for the kind of instruction to instantiate and provide the necessary parameters.
  2065. For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus:
  2066. .. code-block:: c++
  2067. auto *ai = new AllocaInst(Type::Int32Ty);
  2068. will create an ``AllocaInst`` instance that represents the allocation of one
  2069. integer in the current stack frame, at run time. Each ``Instruction`` subclass
  2070. is likely to have varying default parameters which change the semantics of the
  2071. instruction, so refer to the `doxygen documentation for the subclass of
  2072. Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that
  2073. you're interested in instantiating.
  2074. *Naming values*
  2075. It is very useful to name the values of instructions when you're able to, as
  2076. this facilitates the debugging of your transformations. If you end up looking
  2077. at generated LLVM machine code, you definitely want to have logical names
  2078. associated with the results of instructions! By supplying a value for the
  2079. ``Name`` (default) parameter of the ``Instruction`` constructor, you associate a
  2080. logical name with the result of the instruction's execution at run time. For
  2081. example, say that I'm writing a transformation that dynamically allocates space
  2082. for an integer on the stack, and that integer is going to be used as some kind
  2083. of index by some other code. To accomplish this, I place an ``AllocaInst`` at
  2084. the first point in the first ``BasicBlock`` of some ``Function``, and I'm
  2085. intending to use it within the same ``Function``. I might do:
  2086. .. code-block:: c++
  2087. auto *pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
  2088. where ``indexLoc`` is now the logical name of the instruction's execution value,
  2089. which is a pointer to an integer on the run time stack.
  2090. *Inserting instructions*
  2091. There are essentially three ways to insert an ``Instruction`` into an existing
  2092. sequence of instructions that form a ``BasicBlock``:
  2093. * Insertion into an explicit instruction list
  2094. Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``,
  2095. and a newly-created instruction we wish to insert before ``*pi``, we do the
  2096. following:
  2097. .. code-block:: c++
  2098. BasicBlock *pb = ...;
  2099. Instruction *pi = ...;
  2100. auto *newInst = new Instruction(...);
  2101. pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
  2102. Appending to the end of a ``BasicBlock`` is so common that the ``Instruction``
  2103. class and ``Instruction``-derived classes provide constructors which take a
  2104. pointer to a ``BasicBlock`` to be appended to. For example code that looked
  2105. like:
  2106. .. code-block:: c++
  2107. BasicBlock *pb = ...;
  2108. auto *newInst = new Instruction(...);
  2109. pb->getInstList().push_back(newInst); // Appends newInst to pb
  2110. becomes:
  2111. .. code-block:: c++
  2112. BasicBlock *pb = ...;
  2113. auto *newInst = new Instruction(..., pb);
  2114. which is much cleaner, especially if you are creating long instruction
  2115. streams.
  2116. * Insertion into an implicit instruction list
  2117. ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly
  2118. associated with an existing instruction list: the instruction list of the
  2119. enclosing basic block. Thus, we could have accomplished the same thing as the
  2120. above code without being given a ``BasicBlock`` by doing:
  2121. .. code-block:: c++
  2122. Instruction *pi = ...;
  2123. auto *newInst = new Instruction(...);
  2124. pi->getParent()->getInstList().insert(pi, newInst);
  2125. In fact, this sequence of steps occurs so frequently that the ``Instruction``
  2126. class and ``Instruction``-derived classes provide constructors which take (as
  2127. a default parameter) a pointer to an ``Instruction`` which the newly-created
  2128. ``Instruction`` should precede. That is, ``Instruction`` constructors are
  2129. capable of inserting the newly-created instance into the ``BasicBlock`` of a
  2130. provided instruction, immediately before that instruction. Using an
  2131. ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the
  2132. above code becomes:
  2133. .. code-block:: c++
  2134. Instruction* pi = ...;
  2135. auto *newInst = new Instruction(..., pi);
  2136. which is much cleaner, especially if you're creating a lot of instructions and
  2137. adding them to ``BasicBlock``\ s.
  2138. * Insertion using an instance of ``IRBuilder``
  2139. Inserting several ``Instruction``\ s can be quite laborious using the previous
  2140. methods. The ``IRBuilder`` is a convenience class that can be used to add
  2141. several instructions to the end of a ``BasicBlock`` or before a particular
  2142. ``Instruction``. It also supports constant folding and renaming named
  2143. registers (see ``IRBuilder``'s template arguments).
  2144. The example below demonstrates a very simple use of the ``IRBuilder`` where
  2145. three instructions are inserted before the instruction ``pi``. The first two
  2146. instructions are Call instructions and third instruction multiplies the return
  2147. value of the two calls.
  2148. .. code-block:: c++
  2149. Instruction *pi = ...;
  2150. IRBuilder<> Builder(pi);
  2151. CallInst* callOne = Builder.CreateCall(...);
  2152. CallInst* callTwo = Builder.CreateCall(...);
  2153. Value* result = Builder.CreateMul(callOne, callTwo);
  2154. The example below is similar to the above example except that the created
  2155. ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``.
  2156. .. code-block:: c++
  2157. BasicBlock *pb = ...;
  2158. IRBuilder<> Builder(pb);
  2159. CallInst* callOne = Builder.CreateCall(...);
  2160. CallInst* callTwo = Builder.CreateCall(...);
  2161. Value* result = Builder.CreateMul(callOne, callTwo);
  2162. See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``.
  2163. .. _schanges_deleting:
  2164. Deleting Instructions
  2165. ^^^^^^^^^^^^^^^^^^^^^
  2166. Deleting an instruction from an existing sequence of instructions that form a
  2167. BasicBlock_ is very straight-forward: just call the instruction's
  2168. ``eraseFromParent()`` method. For example:
  2169. .. code-block:: c++
  2170. Instruction *I = .. ;
  2171. I->eraseFromParent();
  2172. This unlinks the instruction from its containing basic block and deletes it. If
  2173. you'd just like to unlink the instruction from its containing basic block but
  2174. not delete it, you can use the ``removeFromParent()`` method.
  2175. .. _schanges_replacing:
  2176. Replacing an Instruction with another Value
  2177. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2178. Replacing individual instructions
  2179. """""""""""""""""""""""""""""""""
  2180. Including "`llvm/Transforms/Utils/BasicBlockUtils.h
  2181. <http://llvm.org/doxygen/BasicBlockUtils_8h_source.html>`_" permits use of two
  2182. very useful replace functions: ``ReplaceInstWithValue`` and
  2183. ``ReplaceInstWithInst``.
  2184. .. _schanges_deleting_sub:
  2185. Deleting Instructions
  2186. """""""""""""""""""""
  2187. * ``ReplaceInstWithValue``
  2188. This function replaces all uses of a given instruction with a value, and then
  2189. removes the original instruction. The following example illustrates the
  2190. replacement of the result of a particular ``AllocaInst`` that allocates memory
  2191. for a single integer with a null pointer to an integer.
  2192. .. code-block:: c++
  2193. AllocaInst* instToReplace = ...;
  2194. BasicBlock::iterator ii(instToReplace);
  2195. ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii,
  2196. Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
  2197. * ``ReplaceInstWithInst``
  2198. This function replaces a particular instruction with another instruction,
  2199. inserting the new instruction into the basic block at the location where the
  2200. old instruction was, and replacing any uses of the old instruction with the
  2201. new instruction. The following example illustrates the replacement of one
  2202. ``AllocaInst`` with another.
  2203. .. code-block:: c++
  2204. AllocaInst* instToReplace = ...;
  2205. BasicBlock::iterator ii(instToReplace);
  2206. ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii,
  2207. new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
  2208. Replacing multiple uses of Users and Values
  2209. """""""""""""""""""""""""""""""""""""""""""
  2210. You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to
  2211. change more than one use at a time. See the doxygen documentation for the
  2212. `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class
  2213. <http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more
  2214. information.
  2215. .. _schanges_deletingGV:
  2216. Deleting GlobalVariables
  2217. ^^^^^^^^^^^^^^^^^^^^^^^^
  2218. Deleting a global variable from a module is just as easy as deleting an
  2219. Instruction. First, you must have a pointer to the global variable that you
  2220. wish to delete. You use this pointer to erase it from its parent, the module.
  2221. For example:
  2222. .. code-block:: c++
  2223. GlobalVariable *GV = .. ;
  2224. GV->eraseFromParent();
  2225. .. _threading:
  2226. Threads and LLVM
  2227. ================
  2228. This section describes the interaction of the LLVM APIs with multithreading,
  2229. both on the part of client applications, and in the JIT, in the hosted
  2230. application.
  2231. Note that LLVM's support for multithreading is still relatively young. Up
  2232. through version 2.5, the execution of threaded hosted applications was
  2233. supported, but not threaded client access to the APIs. While this use case is
  2234. now supported, clients *must* adhere to the guidelines specified below to ensure
  2235. proper operation in multithreaded mode.
  2236. Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic
  2237. intrinsics in order to support threaded operation. If you need a
  2238. multhreading-capable LLVM on a platform without a suitably modern system
  2239. compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and
  2240. using the resultant compiler to build a copy of LLVM with multithreading
  2241. support.
  2242. .. _shutdown:
  2243. Ending Execution with ``llvm_shutdown()``
  2244. -----------------------------------------
  2245. When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to
  2246. deallocate memory used for internal structures.
  2247. .. _managedstatic:
  2248. Lazy Initialization with ``ManagedStatic``
  2249. ------------------------------------------
  2250. ``ManagedStatic`` is a utility class in LLVM used to implement static
  2251. initialization of static resources, such as the global type tables. In a
  2252. single-threaded environment, it implements a simple lazy initialization scheme.
  2253. When LLVM is compiled with support for multi-threading, however, it uses
  2254. double-checked locking to implement thread-safe lazy initialization.
  2255. .. _llvmcontext:
  2256. Achieving Isolation with ``LLVMContext``
  2257. ----------------------------------------
  2258. ``LLVMContext`` is an opaque class in the LLVM API which clients can use to
  2259. operate multiple, isolated instances of LLVM concurrently within the same
  2260. address space. For instance, in a hypothetical compile-server, the compilation
  2261. of an individual translation unit is conceptually independent from all the
  2262. others, and it would be desirable to be able to compile incoming translation
  2263. units concurrently on independent server threads. Fortunately, ``LLVMContext``
  2264. exists to enable just this kind of scenario!
  2265. Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity
  2266. (``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's
  2267. in-memory IR belongs to an ``LLVMContext``. Entities in different contexts
  2268. *cannot* interact with each other: ``Module``\ s in different contexts cannot be
  2269. linked together, ``Function``\ s cannot be added to ``Module``\ s in different
  2270. contexts, etc. What this means is that is safe to compile on multiple
  2271. threads simultaneously, as long as no two threads operate on entities within the
  2272. same context.
  2273. In practice, very few places in the API require the explicit specification of a
  2274. ``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every
  2275. ``Type`` carries a reference to its owning context, most other entities can
  2276. determine what context they belong to by looking at their own ``Type``. If you
  2277. are adding new entities to LLVM IR, please try to maintain this interface
  2278. design.
  2279. .. _jitthreading:
  2280. Threads and the JIT
  2281. -------------------
  2282. LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple
  2283. threads can call ``ExecutionEngine::getPointerToFunction()`` or
  2284. ``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run
  2285. code output by the JIT concurrently. The user must still ensure that only one
  2286. thread accesses IR in a given ``LLVMContext`` while another thread might be
  2287. modifying it. One way to do that is to always hold the JIT lock while accessing
  2288. IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s).
  2289. Another way is to only call ``getPointerToFunction()`` from the
  2290. ``LLVMContext``'s thread.
  2291. When the JIT is configured to compile lazily (using
  2292. ``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race
  2293. condition <https://bugs.llvm.org/show_bug.cgi?id=5184>`_ in updating call sites
  2294. after a function is lazily-jitted. It's still possible to use the lazy JIT in a
  2295. threaded program if you ensure that only one thread at a time can call any
  2296. particular lazy stub and that the JIT lock guards any IR access, but we suggest
  2297. using only the eager JIT in threaded programs.
  2298. .. _advanced:
  2299. Advanced Topics
  2300. ===============
  2301. This section describes some of the advanced or obscure API's that most clients
  2302. do not need to be aware of. These API's tend manage the inner workings of the
  2303. LLVM system, and only need to be accessed in unusual circumstances.
  2304. .. _SymbolTable:
  2305. The ``ValueSymbolTable`` class
  2306. ------------------------------
  2307. The ``ValueSymbolTable`` (`doxygen
  2308. <http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides
  2309. a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for
  2310. naming value definitions. The symbol table can provide a name for any Value_.
  2311. Note that the ``SymbolTable`` class should not be directly accessed by most
  2312. clients. It should only be used when iteration over the symbol table names
  2313. themselves are required, which is very special purpose. Note that not all LLVM
  2314. Value_\ s have names, and those without names (i.e. they have an empty name) do
  2315. not exist in the symbol table.
  2316. Symbol tables support iteration over the values in the symbol table with
  2317. ``begin/end/iterator`` and supports querying to see if a specific name is in the
  2318. symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no
  2319. public mutator methods, instead, simply call ``setName`` on a value, which will
  2320. autoinsert it into the appropriate symbol table.
  2321. .. _UserLayout:
  2322. The ``User`` and owned ``Use`` classes' memory layout
  2323. -----------------------------------------------------
  2324. The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__)
  2325. class provides a basis for expressing the ownership of ``User`` towards other
  2326. `Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The
  2327. ``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper
  2328. class is employed to do the bookkeeping and to facilitate *O(1)* addition and
  2329. removal.
  2330. .. _Use2User:
  2331. Interaction and relationship between ``User`` and ``Use`` objects
  2332. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2333. A subclass of ``User`` can choose between incorporating its ``Use`` objects or
  2334. refer to them out-of-line by means of a pointer. A mixed variant (some ``Use``
  2335. s inline others hung off) is impractical and breaks the invariant that the
  2336. ``Use`` objects belonging to the same ``User`` form a contiguous array.
  2337. We have 2 different layouts in the ``User`` (sub)classes:
  2338. * Layout a)
  2339. The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User``
  2340. object and there are a fixed number of them.
  2341. * Layout b)
  2342. The ``Use`` object(s) are referenced by a pointer to an array from the
  2343. ``User`` object and there may be a variable number of them.
  2344. As of v2.4 each layout still possesses a direct pointer to the start of the
  2345. array of ``Use``\ s. Though not mandatory for layout a), we stick to this
  2346. redundancy for the sake of simplicity. The ``User`` object also stores the
  2347. number of ``Use`` objects it has. (Theoretically this information can also be
  2348. calculated given the scheme presented below.)
  2349. Special forms of allocation operators (``operator new``) enforce the following
  2350. memory layouts:
  2351. * Layout a) is modelled by prepending the ``User`` object by the ``Use[]``
  2352. array.
  2353. .. code-block:: none
  2354. ...---.---.---.---.-------...
  2355. | P | P | P | P | User
  2356. '''---'---'---'---'-------'''
  2357. * Layout b) is modelled by pointing at the ``Use[]`` array.
  2358. .. code-block:: none
  2359. .-------...
  2360. | User
  2361. '-------'''
  2362. |
  2363. v
  2364. .---.---.---.---...
  2365. | P | P | P | P |
  2366. '---'---'---'---'''
  2367. *(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in
  2368. each* ``Use`` *object in the member* ``Use::Prev`` *)*
  2369. .. _Waymarking:
  2370. The waymarking algorithm
  2371. ^^^^^^^^^^^^^^^^^^^^^^^^
  2372. Since the ``Use`` objects are deprived of the direct (back)pointer to their
  2373. ``User`` objects, there must be a fast and exact method to recover it. This is
  2374. accomplished by the following scheme:
  2375. A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev``
  2376. allows to find the start of the ``User`` object:
  2377. * ``00`` --- binary digit 0
  2378. * ``01`` --- binary digit 1
  2379. * ``10`` --- stop and calculate (``s``)
  2380. * ``11`` --- full stop (``S``)
  2381. Given a ``Use*``, all we have to do is to walk till we get a stop and we either
  2382. have a ``User`` immediately behind or we have to walk to the next stop picking
  2383. up digits and calculating the offset:
  2384. .. code-block:: none
  2385. .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
  2386. | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
  2387. '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
  2388. |+15 |+10 |+6 |+3 |+1
  2389. | | | | | __>
  2390. | | | | __________>
  2391. | | | ______________________>
  2392. | | ______________________________________>
  2393. | __________________________________________________________>
  2394. Only the significant number of bits need to be stored between the stops, so that
  2395. the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects
  2396. associated with a ``User``.
  2397. .. _ReferenceImpl:
  2398. Reference implementation
  2399. ^^^^^^^^^^^^^^^^^^^^^^^^
  2400. The following literate Haskell fragment demonstrates the concept:
  2401. .. code-block:: haskell
  2402. > import Test.QuickCheck
  2403. >
  2404. > digits :: Int -> [Char] -> [Char]
  2405. > digits 0 acc = '0' : acc
  2406. > digits 1 acc = '1' : acc
  2407. > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
  2408. >
  2409. > dist :: Int -> [Char] -> [Char]
  2410. > dist 0 [] = ['S']
  2411. > dist 0 acc = acc
  2412. > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
  2413. > dist n acc = dist (n - 1) $ dist 1 acc
  2414. >
  2415. > takeLast n ss = reverse $ take n $ reverse ss
  2416. >
  2417. > test = takeLast 40 $ dist 20 []
  2418. >
  2419. Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"``
  2420. The reverse algorithm computes the length of the string just by examining a
  2421. certain prefix:
  2422. .. code-block:: haskell
  2423. > pref :: [Char] -> Int
  2424. > pref "S" = 1
  2425. > pref ('s':'1':rest) = decode 2 1 rest
  2426. > pref (_:rest) = 1 + pref rest
  2427. >
  2428. > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
  2429. > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
  2430. > decode walk acc _ = walk + acc
  2431. >
  2432. Now, as expected, printing <pref test> gives ``40``.
  2433. We can *quickCheck* this with following property:
  2434. .. code-block:: haskell
  2435. > testcase = dist 2000 []
  2436. > testcaseLength = length testcase
  2437. >
  2438. > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
  2439. > where arr = takeLast n testcase
  2440. >
  2441. As expected <quickCheck identityProp> gives:
  2442. ::
  2443. *Main> quickCheck identityProp
  2444. OK, passed 100 tests.
  2445. Let's be a bit more exhaustive:
  2446. .. code-block:: haskell
  2447. >
  2448. > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
  2449. >
  2450. And here is the result of <deepCheck identityProp>:
  2451. ::
  2452. *Main> deepCheck identityProp
  2453. OK, passed 500 tests.
  2454. .. _Tagging:
  2455. Tagging considerations
  2456. ^^^^^^^^^^^^^^^^^^^^^^
  2457. To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never
  2458. change after being set up, setters of ``Use::Prev`` must re-tag the new
  2459. ``Use**`` on every modification. Accordingly getters must strip the tag bits.
  2460. For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit
  2461. set). Following this pointer brings us to the ``User``. A portable trick
  2462. ensures that the first bytes of ``User`` (if interpreted as a pointer) never has
  2463. the LSBit set. (Portability is relying on the fact that all known compilers
  2464. place the ``vptr`` in the first word of the instances.)
  2465. .. _polymorphism:
  2466. Designing Type Hiercharies and Polymorphic Interfaces
  2467. -----------------------------------------------------
  2468. There are two different design patterns that tend to result in the use of
  2469. virtual dispatch for methods in a type hierarchy in C++ programs. The first is
  2470. a genuine type hierarchy where different types in the hierarchy model
  2471. a specific subset of the functionality and semantics, and these types nest
  2472. strictly within each other. Good examples of this can be seen in the ``Value``
  2473. or ``Type`` type hierarchies.
  2474. A second is the desire to dispatch dynamically across a collection of
  2475. polymorphic interface implementations. This latter use case can be modeled with
  2476. virtual dispatch and inheritance by defining an abstract interface base class
  2477. which all implementations derive from and override. However, this
  2478. implementation strategy forces an **"is-a"** relationship to exist that is not
  2479. actually meaningful. There is often not some nested hierarchy of useful
  2480. generalizations which code might interact with and move up and down. Instead,
  2481. there is a singular interface which is dispatched across a range of
  2482. implementations.
  2483. The preferred implementation strategy for the second use case is that of
  2484. generic programming (sometimes called "compile-time duck typing" or "static
  2485. polymorphism"). For example, a template over some type parameter ``T`` can be
  2486. instantiated across any particular implementation that conforms to the
  2487. interface or *concept*. A good example here is the highly generic properties of
  2488. any type which models a node in a directed graph. LLVM models these primarily
  2489. through templates and generic programming. Such templates include the
  2490. ``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism
  2491. truly needs **dynamic** dispatch you can generalize it using a technique
  2492. called *concept-based polymorphism*. This pattern emulates the interfaces and
  2493. behaviors of templates using a very limited form of virtual dispatch for type
  2494. erasure inside its implementation. You can find examples of this technique in
  2495. the ``PassManager.h`` system, and there is a more detailed introduction to it
  2496. by Sean Parent in several of his talks and papers:
  2497. #. `Inheritance Is The Base Class of Evil
  2498. <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_
  2499. - The GoingNative 2013 talk describing this technique, and probably the best
  2500. place to start.
  2501. #. `Value Semantics and Concepts-based Polymorphism
  2502. <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk
  2503. describing this technique in more detail.
  2504. #. `Sean Parent's Papers and Presentations
  2505. <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_
  2506. - A Github project full of links to slides, video, and sometimes code.
  2507. When deciding between creating a type hierarchy (with either tagged or virtual
  2508. dispatch) and using templates or concepts-based polymorphism, consider whether
  2509. there is some refinement of an abstract base class which is a semantically
  2510. meaningful type on an interface boundary. If anything more refined than the
  2511. root abstract interface is meaningless to talk about as a partial extension of
  2512. the semantic model, then your use case likely fits better with polymorphism and
  2513. you should avoid using virtual dispatch. However, there may be some exigent
  2514. circumstances that require one technique or the other to be used.
  2515. If you do need to introduce a type hierarchy, we prefer to use explicitly
  2516. closed type hierarchies with manual tagged dispatch and/or RTTI rather than the
  2517. open inheritance model and virtual dispatch that is more common in C++ code.
  2518. This is because LLVM rarely encourages library consumers to extend its core
  2519. types, and leverages the closed and tag-dispatched nature of its hierarchies to
  2520. generate significantly more efficient code. We have also found that a large
  2521. amount of our usage of type hierarchies fits better with tag-based pattern
  2522. matching rather than dynamic dispatch across a common interface. Within LLVM we
  2523. have built custom helpers to facilitate this design. See this document's
  2524. section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document
  2525. <HowToSetUpLLVMStyleRTTI>` which describes how you can implement this
  2526. pattern for use with the LLVM helpers.
  2527. .. _abi_breaking_checks:
  2528. ABI Breaking Checks
  2529. -------------------
  2530. Checks and asserts that alter the LLVM C++ ABI are predicated on the
  2531. preprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM
  2532. libraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI
  2533. compatible LLVM libraries built without it defined. By default,
  2534. turning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS`
  2535. so a default +Asserts build is not ABI compatible with a
  2536. default -Asserts build. Clients that want ABI compatibility
  2537. between +Asserts and -Asserts builds should use the CMake build system
  2538. to set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently
  2539. of `LLVM_ENABLE_ASSERTIONS`.
  2540. .. _coreclasses:
  2541. The Core LLVM Class Hierarchy Reference
  2542. =======================================
  2543. ``#include "llvm/IR/Type.h"``
  2544. header source: `Type.h <http://llvm.org/doxygen/Type_8h_source.html>`_
  2545. doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_
  2546. The Core LLVM classes are the primary means of representing the program being
  2547. inspected or transformed. The core LLVM classes are defined in header files in
  2548. the ``include/llvm/IR`` directory, and implemented in the ``lib/IR``
  2549. directory. It's worth noting that, for historical reasons, this library is
  2550. called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect.
  2551. .. _Type:
  2552. The Type class and Derived Types
  2553. --------------------------------
  2554. ``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``.
  2555. ``Type`` cannot be instantiated directly but only through its subclasses.
  2556. Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and
  2557. ``DoubleType``) have hidden subclasses. They are hidden because they offer no
  2558. useful functionality beyond what the ``Type`` class offers except to distinguish
  2559. themselves from other subclasses of ``Type``.
  2560. All other types are subclasses of ``DerivedType``. Types can be named, but this
  2561. is not a requirement. There exists exactly one instance of a given shape at any
  2562. one time. This allows type equality to be performed with address equality of
  2563. the Type Instance. That is, given two ``Type*`` values, the types are identical
  2564. if the pointers are identical.
  2565. .. _m_Type:
  2566. Important Public Methods
  2567. ^^^^^^^^^^^^^^^^^^^^^^^^
  2568. * ``bool isIntegerTy() const``: Returns true for any integer type.
  2569. * ``bool isFloatingPointTy()``: Return true if this is one of the five
  2570. floating point types.
  2571. * ``bool isSized()``: Return true if the type has known size. Things
  2572. that don't have a size are abstract types, labels and void.
  2573. .. _derivedtypes:
  2574. Important Derived Types
  2575. ^^^^^^^^^^^^^^^^^^^^^^^
  2576. ``IntegerType``
  2577. Subclass of DerivedType that represents integer types of any bit width. Any
  2578. bit width between ``IntegerType::MIN_INT_BITS`` (1) and
  2579. ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented.
  2580. * ``static const IntegerType* get(unsigned NumBits)``: get an integer
  2581. type of a specific bit width.
  2582. * ``unsigned getBitWidth() const``: Get the bit width of an integer type.
  2583. ``SequentialType``
  2584. This is subclassed by ArrayType and VectorType.
  2585. * ``const Type * getElementType() const``: Returns the type of each
  2586. of the elements in the sequential type.
  2587. * ``uint64_t getNumElements() const``: Returns the number of elements
  2588. in the sequential type.
  2589. ``ArrayType``
  2590. This is a subclass of SequentialType and defines the interface for array
  2591. types.
  2592. ``PointerType``
  2593. Subclass of Type for pointer types.
  2594. ``VectorType``
  2595. Subclass of SequentialType for vector types. A vector type is similar to an
  2596. ArrayType but is distinguished because it is a first class type whereas
  2597. ArrayType is not. Vector types are used for vector operations and are usually
  2598. small vectors of an integer or floating point type.
  2599. ``StructType``
  2600. Subclass of DerivedTypes for struct types.
  2601. .. _FunctionType:
  2602. ``FunctionType``
  2603. Subclass of DerivedTypes for function types.
  2604. * ``bool isVarArg() const``: Returns true if it's a vararg function.
  2605. * ``const Type * getReturnType() const``: Returns the return type of the
  2606. function.
  2607. * ``const Type * getParamType (unsigned i)``: Returns the type of the ith
  2608. parameter.
  2609. * ``const unsigned getNumParams() const``: Returns the number of formal
  2610. parameters.
  2611. .. _Module:
  2612. The ``Module`` class
  2613. --------------------
  2614. ``#include "llvm/IR/Module.h"``
  2615. header source: `Module.h <http://llvm.org/doxygen/Module_8h_source.html>`_
  2616. doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_
  2617. The ``Module`` class represents the top level structure present in LLVM
  2618. programs. An LLVM module is effectively either a translation unit of the
  2619. original program or a combination of several translation units merged by the
  2620. linker. The ``Module`` class keeps track of a list of :ref:`Function
  2621. <c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_.
  2622. Additionally, it contains a few helpful member functions that try to make common
  2623. operations easy.
  2624. .. _m_Module:
  2625. Important Public Members of the ``Module`` class
  2626. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2627. * ``Module::Module(std::string name = "")``
  2628. Constructing a Module_ is easy. You can optionally provide a name for it
  2629. (probably based on the name of the translation unit).
  2630. * | ``Module::iterator`` - Typedef for function list iterator
  2631. | ``Module::const_iterator`` - Typedef for const_iterator.
  2632. | ``begin()``, ``end()``, ``size()``, ``empty()``
  2633. These are forwarding methods that make it easy to access the contents of a
  2634. ``Module`` object's :ref:`Function <c_Function>` list.
  2635. * ``Module::FunctionListType &getFunctionList()``
  2636. Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use
  2637. when you need to update the list or perform a complex action that doesn't have
  2638. a forwarding method.
  2639. ----------------
  2640. * | ``Module::global_iterator`` - Typedef for global variable list iterator
  2641. | ``Module::const_global_iterator`` - Typedef for const_iterator.
  2642. | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()``
  2643. These are forwarding methods that make it easy to access the contents of a
  2644. ``Module`` object's GlobalVariable_ list.
  2645. * ``Module::GlobalListType &getGlobalList()``
  2646. Returns the list of GlobalVariable_\ s. This is necessary to use when you
  2647. need to update the list or perform a complex action that doesn't have a
  2648. forwarding method.
  2649. ----------------
  2650. * ``SymbolTable *getSymbolTable()``
  2651. Return a reference to the SymbolTable_ for this ``Module``.
  2652. ----------------
  2653. * ``Function *getFunction(StringRef Name) const``
  2654. Look up the specified function in the ``Module`` SymbolTable_. If it does not
  2655. exist, return ``null``.
  2656. * ``FunctionCallee getOrInsertFunction(const std::string &Name,
  2657. const FunctionType *T)``
  2658. Look up the specified function in the ``Module`` SymbolTable_. If
  2659. it does not exist, add an external declaration for the function and
  2660. return it. Note that the function signature already present may not
  2661. match the requested signature. Thus, in order to enable the common
  2662. usage of passing the result directly to EmitCall, the return type is
  2663. a struct of ``{FunctionType *T, Constant *FunctionPtr}``, rather
  2664. than simply the ``Function*`` with potentially an unexpected
  2665. signature.
  2666. * ``std::string getTypeName(const Type *Ty)``
  2667. If there is at least one entry in the SymbolTable_ for the specified Type_,
  2668. return it. Otherwise return the empty string.
  2669. * ``bool addTypeName(const std::string &Name, const Type *Ty)``
  2670. Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is
  2671. already an entry for this name, true is returned and the SymbolTable_ is not
  2672. modified.
  2673. .. _Value:
  2674. The ``Value`` class
  2675. -------------------
  2676. ``#include "llvm/IR/Value.h"``
  2677. header source: `Value.h <http://llvm.org/doxygen/Value_8h_source.html>`_
  2678. doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_
  2679. The ``Value`` class is the most important class in the LLVM Source base. It
  2680. represents a typed value that may be used (among other things) as an operand to
  2681. an instruction. There are many different types of ``Value``\ s, such as
  2682. Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function
  2683. <c_Function>`\ s are ``Value``\ s.
  2684. A particular ``Value`` may be used many times in the LLVM representation for a
  2685. program. For example, an incoming argument to a function (represented with an
  2686. instance of the Argument_ class) is "used" by every instruction in the function
  2687. that references the argument. To keep track of this relationship, the ``Value``
  2688. class keeps a list of all of the ``User``\ s that is using it (the User_ class
  2689. is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s).
  2690. This use list is how LLVM represents def-use information in the program, and is
  2691. accessible through the ``use_*`` methods, shown below.
  2692. Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this
  2693. Type_ is available through the ``getType()`` method. In addition, all LLVM
  2694. values can be named. The "name" of the ``Value`` is a symbolic string printed
  2695. in the LLVM code:
  2696. .. code-block:: llvm
  2697. %foo = add i32 1, 2
  2698. .. _nameWarning:
  2699. The name of this instruction is "foo". **NOTE** that the name of any value may
  2700. be missing (an empty string), so names should **ONLY** be used for debugging
  2701. (making the source code easier to read, debugging printouts), they should not be
  2702. used to keep track of values or map between them. For this purpose, use a
  2703. ``std::map`` of pointers to the ``Value`` itself instead.
  2704. One important aspect of LLVM is that there is no distinction between an SSA
  2705. variable and the operation that produces it. Because of this, any reference to
  2706. the value produced by an instruction (or the value available as an incoming
  2707. argument, for example) is represented as a direct pointer to the instance of the
  2708. class that represents this value. Although this may take some getting used to,
  2709. it simplifies the representation and makes it easier to manipulate.
  2710. .. _m_Value:
  2711. Important Public Members of the ``Value`` class
  2712. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2713. * | ``Value::use_iterator`` - Typedef for iterator over the use-list
  2714. | ``Value::const_use_iterator`` - Typedef for const_iterator over the
  2715. use-list
  2716. | ``unsigned use_size()`` - Returns the number of users of the value.
  2717. | ``bool use_empty()`` - Returns true if there are no users.
  2718. | ``use_iterator use_begin()`` - Get an iterator to the start of the
  2719. use-list.
  2720. | ``use_iterator use_end()`` - Get an iterator to the end of the use-list.
  2721. | ``User *use_back()`` - Returns the last element in the list.
  2722. These methods are the interface to access the def-use information in LLVM.
  2723. As with all other iterators in LLVM, the naming conventions follow the
  2724. conventions defined by the STL_.
  2725. * ``Type *getType() const``
  2726. This method returns the Type of the Value.
  2727. * | ``bool hasName() const``
  2728. | ``std::string getName() const``
  2729. | ``void setName(const std::string &Name)``
  2730. This family of methods is used to access and assign a name to a ``Value``, be
  2731. aware of the :ref:`precaution above <nameWarning>`.
  2732. * ``void replaceAllUsesWith(Value *V)``
  2733. This method traverses the use list of a ``Value`` changing all User_\ s of the
  2734. current value to refer to "``V``" instead. For example, if you detect that an
  2735. instruction always produces a constant value (for example through constant
  2736. folding), you can replace all uses of the instruction with the constant like
  2737. this:
  2738. .. code-block:: c++
  2739. Inst->replaceAllUsesWith(ConstVal);
  2740. .. _User:
  2741. The ``User`` class
  2742. ------------------
  2743. ``#include "llvm/IR/User.h"``
  2744. header source: `User.h <http://llvm.org/doxygen/User_8h_source.html>`_
  2745. doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_
  2746. Superclass: Value_
  2747. The ``User`` class is the common base class of all LLVM nodes that may refer to
  2748. ``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s
  2749. that the User is referring to. The ``User`` class itself is a subclass of
  2750. ``Value``.
  2751. The operands of a ``User`` point directly to the LLVM ``Value`` that it refers
  2752. to. Because LLVM uses Static Single Assignment (SSA) form, there can only be
  2753. one definition referred to, allowing this direct connection. This connection
  2754. provides the use-def information in LLVM.
  2755. .. _m_User:
  2756. Important Public Members of the ``User`` class
  2757. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2758. The ``User`` class exposes the operand list in two ways: through an index access
  2759. interface and through an iterator based interface.
  2760. * | ``Value *getOperand(unsigned i)``
  2761. | ``unsigned getNumOperands()``
  2762. These two methods expose the operands of the ``User`` in a convenient form for
  2763. direct access.
  2764. * | ``User::op_iterator`` - Typedef for iterator over the operand list
  2765. | ``op_iterator op_begin()`` - Get an iterator to the start of the operand
  2766. list.
  2767. | ``op_iterator op_end()`` - Get an iterator to the end of the operand list.
  2768. Together, these methods make up the iterator based interface to the operands
  2769. of a ``User``.
  2770. .. _Instruction:
  2771. The ``Instruction`` class
  2772. -------------------------
  2773. ``#include "llvm/IR/Instruction.h"``
  2774. header source: `Instruction.h
  2775. <http://llvm.org/doxygen/Instruction_8h_source.html>`_
  2776. doxygen info: `Instruction Class
  2777. <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_
  2778. Superclasses: User_, Value_
  2779. The ``Instruction`` class is the common base class for all LLVM instructions.
  2780. It provides only a few methods, but is a very commonly used class. The primary
  2781. data tracked by the ``Instruction`` class itself is the opcode (instruction
  2782. type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To
  2783. represent a specific type of instruction, one of many subclasses of
  2784. ``Instruction`` are used.
  2785. Because the ``Instruction`` class subclasses the User_ class, its operands can
  2786. be accessed in the same way as for other ``User``\ s (with the
  2787. ``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods).
  2788. An important file for the ``Instruction`` class is the ``llvm/Instruction.def``
  2789. file. This file contains some meta-data about the various different types of
  2790. instructions in LLVM. It describes the enum values that are used as opcodes
  2791. (for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the
  2792. concrete sub-classes of ``Instruction`` that implement the instruction (for
  2793. example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this
  2794. file confuses doxygen, so these enum values don't show up correctly in the
  2795. `doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_.
  2796. .. _s_Instruction:
  2797. Important Subclasses of the ``Instruction`` class
  2798. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2799. .. _BinaryOperator:
  2800. * ``BinaryOperator``
  2801. This subclasses represents all two operand instructions whose operands must be
  2802. the same type, except for the comparison instructions.
  2803. .. _CastInst:
  2804. * ``CastInst``
  2805. This subclass is the parent of the 12 casting instructions. It provides
  2806. common operations on cast instructions.
  2807. .. _CmpInst:
  2808. * ``CmpInst``
  2809. This subclass represents the two comparison instructions,
  2810. `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and
  2811. `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands).
  2812. .. _m_Instruction:
  2813. Important Public Members of the ``Instruction`` class
  2814. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2815. * ``BasicBlock *getParent()``
  2816. Returns the BasicBlock_ that this
  2817. ``Instruction`` is embedded into.
  2818. * ``bool mayWriteToMemory()``
  2819. Returns true if the instruction writes to memory, i.e. it is a ``call``,
  2820. ``free``, ``invoke``, or ``store``.
  2821. * ``unsigned getOpcode()``
  2822. Returns the opcode for the ``Instruction``.
  2823. * ``Instruction *clone() const``
  2824. Returns another instance of the specified instruction, identical in all ways
  2825. to the original except that the instruction has no parent (i.e. it's not
  2826. embedded into a BasicBlock_), and it has no name.
  2827. .. _Constant:
  2828. The ``Constant`` class and subclasses
  2829. -------------------------------------
  2830. Constant represents a base class for different types of constants. It is
  2831. subclassed by ConstantInt, ConstantArray, etc. for representing the various
  2832. types of Constants. GlobalValue_ is also a subclass, which represents the
  2833. address of a global variable or function.
  2834. .. _s_Constant:
  2835. Important Subclasses of Constant
  2836. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2837. * ConstantInt : This subclass of Constant represents an integer constant of
  2838. any width.
  2839. * ``const APInt& getValue() const``: Returns the underlying
  2840. value of this constant, an APInt value.
  2841. * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an
  2842. int64_t via sign extension. If the value (not the bit width) of the APInt
  2843. is too large to fit in an int64_t, an assertion will result. For this
  2844. reason, use of this method is discouraged.
  2845. * ``uint64_t getZExtValue() const``: Converts the underlying APInt value
  2846. to a uint64_t via zero extension. IF the value (not the bit width) of the
  2847. APInt is too large to fit in a uint64_t, an assertion will result. For this
  2848. reason, use of this method is discouraged.
  2849. * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt
  2850. object that represents the value provided by ``Val``. The type is implied
  2851. as the IntegerType that corresponds to the bit width of ``Val``.
  2852. * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the
  2853. ConstantInt object that represents the value provided by ``Val`` for integer
  2854. type ``Ty``.
  2855. * ConstantFP : This class represents a floating point constant.
  2856. * ``double getValue() const``: Returns the underlying value of this constant.
  2857. * ConstantArray : This represents a constant array.
  2858. * ``const std::vector<Use> &getValues() const``: Returns a vector of
  2859. component constants that makeup this array.
  2860. * ConstantStruct : This represents a constant struct.
  2861. * ``const std::vector<Use> &getValues() const``: Returns a vector of
  2862. component constants that makeup this array.
  2863. * GlobalValue : This represents either a global variable or a function. In
  2864. either case, the value is a constant fixed address (after linking).
  2865. .. _GlobalValue:
  2866. The ``GlobalValue`` class
  2867. -------------------------
  2868. ``#include "llvm/IR/GlobalValue.h"``
  2869. header source: `GlobalValue.h
  2870. <http://llvm.org/doxygen/GlobalValue_8h_source.html>`_
  2871. doxygen info: `GlobalValue Class
  2872. <http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_
  2873. Superclasses: Constant_, User_, Value_
  2874. Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the
  2875. only LLVM values that are visible in the bodies of all :ref:`Function
  2876. <c_Function>`\ s. Because they are visible at global scope, they are also
  2877. subject to linking with other globals defined in different translation units.
  2878. To control the linking process, ``GlobalValue``\ s know their linkage rules.
  2879. Specifically, ``GlobalValue``\ s know whether they have internal or external
  2880. linkage, as defined by the ``LinkageTypes`` enumeration.
  2881. If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C),
  2882. it is not visible to code outside the current translation unit, and does not
  2883. participate in linking. If it has external linkage, it is visible to external
  2884. code, and does participate in linking. In addition to linkage information,
  2885. ``GlobalValue``\ s keep track of which Module_ they are currently part of.
  2886. Because ``GlobalValue``\ s are memory objects, they are always referred to by
  2887. their **address**. As such, the Type_ of a global is always a pointer to its
  2888. contents. It is important to remember this when using the ``GetElementPtrInst``
  2889. instruction because this pointer must be dereferenced first. For example, if
  2890. you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array
  2891. of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to
  2892. that array. Although the address of the first element of this array and the
  2893. value of the ``GlobalVariable`` are the same, they have different types. The
  2894. ``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is
  2895. ``i32.`` Because of this, accessing a global value requires you to dereference
  2896. the pointer with ``GetElementPtrInst`` first, then its elements can be accessed.
  2897. This is explained in the `LLVM Language Reference Manual
  2898. <LangRef.html#globalvars>`_.
  2899. .. _m_GlobalValue:
  2900. Important Public Members of the ``GlobalValue`` class
  2901. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2902. * | ``bool hasInternalLinkage() const``
  2903. | ``bool hasExternalLinkage() const``
  2904. | ``void setInternalLinkage(bool HasInternalLinkage)``
  2905. These methods manipulate the linkage characteristics of the ``GlobalValue``.
  2906. * ``Module *getParent()``
  2907. This returns the Module_ that the
  2908. GlobalValue is currently embedded into.
  2909. .. _c_Function:
  2910. The ``Function`` class
  2911. ----------------------
  2912. ``#include "llvm/IR/Function.h"``
  2913. header source: `Function.h <http://llvm.org/doxygen/Function_8h_source.html>`_
  2914. doxygen info: `Function Class
  2915. <http://llvm.org/doxygen/classllvm_1_1Function.html>`_
  2916. Superclasses: GlobalValue_, Constant_, User_, Value_
  2917. The ``Function`` class represents a single procedure in LLVM. It is actually
  2918. one of the more complex classes in the LLVM hierarchy because it must keep track
  2919. of a large amount of data. The ``Function`` class keeps track of a list of
  2920. BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_.
  2921. The list of BasicBlock_\ s is the most commonly used part of ``Function``
  2922. objects. The list imposes an implicit ordering of the blocks in the function,
  2923. which indicate how the code will be laid out by the backend. Additionally, the
  2924. first BasicBlock_ is the implicit entry node for the ``Function``. It is not
  2925. legal in LLVM to explicitly branch to this initial block. There are no implicit
  2926. exit nodes, and in fact there may be multiple exit nodes from a single
  2927. ``Function``. If the BasicBlock_ list is empty, this indicates that the
  2928. ``Function`` is actually a function declaration: the actual body of the function
  2929. hasn't been linked in yet.
  2930. In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track
  2931. of the list of formal Argument_\ s that the function receives. This container
  2932. manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does
  2933. for the BasicBlock_\ s.
  2934. The SymbolTable_ is a very rarely used LLVM feature that is only used when you
  2935. have to look up a value by name. Aside from that, the SymbolTable_ is used
  2936. internally to make sure that there are not conflicts between the names of
  2937. Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body.
  2938. Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The
  2939. value of the function is its address (after linking) which is guaranteed to be
  2940. constant.
  2941. .. _m_Function:
  2942. Important Public Members of the ``Function``
  2943. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  2944. * ``Function(const FunctionType *Ty, LinkageTypes Linkage,
  2945. const std::string &N = "", Module* Parent = 0)``
  2946. Constructor used when you need to create new ``Function``\ s to add the
  2947. program. The constructor must specify the type of the function to create and
  2948. what type of linkage the function should have. The FunctionType_ argument
  2949. specifies the formal arguments and return value for the function. The same
  2950. FunctionType_ value can be used to create multiple functions. The ``Parent``
  2951. argument specifies the Module in which the function is defined. If this
  2952. argument is provided, the function will automatically be inserted into that
  2953. module's list of functions.
  2954. * ``bool isDeclaration()``
  2955. Return whether or not the ``Function`` has a body defined. If the function is
  2956. "external", it does not have a body, and thus must be resolved by linking with
  2957. a function defined in a different translation unit.
  2958. * | ``Function::iterator`` - Typedef for basic block list iterator
  2959. | ``Function::const_iterator`` - Typedef for const_iterator.
  2960. | ``begin()``, ``end()``, ``size()``, ``empty()``
  2961. These are forwarding methods that make it easy to access the contents of a
  2962. ``Function`` object's BasicBlock_ list.
  2963. * ``Function::BasicBlockListType &getBasicBlockList()``
  2964. Returns the list of BasicBlock_\ s. This is necessary to use when you need to
  2965. update the list or perform a complex action that doesn't have a forwarding
  2966. method.
  2967. * | ``Function::arg_iterator`` - Typedef for the argument list iterator
  2968. | ``Function::const_arg_iterator`` - Typedef for const_iterator.
  2969. | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()``
  2970. These are forwarding methods that make it easy to access the contents of a
  2971. ``Function`` object's Argument_ list.
  2972. * ``Function::ArgumentListType &getArgumentList()``
  2973. Returns the list of Argument_. This is necessary to use when you need to
  2974. update the list or perform a complex action that doesn't have a forwarding
  2975. method.
  2976. * ``BasicBlock &getEntryBlock()``
  2977. Returns the entry ``BasicBlock`` for the function. Because the entry block
  2978. for the function is always the first block, this returns the first block of
  2979. the ``Function``.
  2980. * | ``Type *getReturnType()``
  2981. | ``FunctionType *getFunctionType()``
  2982. This traverses the Type_ of the ``Function`` and returns the return type of
  2983. the function, or the FunctionType_ of the actual function.
  2984. * ``SymbolTable *getSymbolTable()``
  2985. Return a pointer to the SymbolTable_ for this ``Function``.
  2986. .. _GlobalVariable:
  2987. The ``GlobalVariable`` class
  2988. ----------------------------
  2989. ``#include "llvm/IR/GlobalVariable.h"``
  2990. header source: `GlobalVariable.h
  2991. <http://llvm.org/doxygen/GlobalVariable_8h_source.html>`_
  2992. doxygen info: `GlobalVariable Class
  2993. <http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_
  2994. Superclasses: GlobalValue_, Constant_, User_, Value_
  2995. Global variables are represented with the (surprise surprise) ``GlobalVariable``
  2996. class. Like functions, ``GlobalVariable``\ s are also subclasses of
  2997. GlobalValue_, and as such are always referenced by their address (global values
  2998. must live in memory, so their "name" refers to their constant address). See
  2999. GlobalValue_ for more on this. Global variables may have an initial value
  3000. (which must be a Constant_), and if they have an initializer, they may be marked
  3001. as "constant" themselves (indicating that their contents never change at
  3002. runtime).
  3003. .. _m_GlobalVariable:
  3004. Important Public Members of the ``GlobalVariable`` class
  3005. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  3006. * ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage,
  3007. Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)``
  3008. Create a new global variable of the specified type. If ``isConstant`` is true
  3009. then the global variable will be marked as unchanging for the program. The
  3010. Linkage parameter specifies the type of linkage (internal, external, weak,
  3011. linkonce, appending) for the variable. If the linkage is InternalLinkage,
  3012. WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then
  3013. the resultant global variable will have internal linkage. AppendingLinkage
  3014. concatenates together all instances (in different translation units) of the
  3015. variable into a single variable but is only applicable to arrays. See the
  3016. `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details
  3017. on linkage types. Optionally an initializer, a name, and the module to put
  3018. the variable into may be specified for the global variable as well.
  3019. * ``bool isConstant() const``
  3020. Returns true if this is a global variable that is known not to be modified at
  3021. runtime.
  3022. * ``bool hasInitializer()``
  3023. Returns true if this ``GlobalVariable`` has an intializer.
  3024. * ``Constant *getInitializer()``
  3025. Returns the initial value for a ``GlobalVariable``. It is not legal to call
  3026. this method if there is no initializer.
  3027. .. _BasicBlock:
  3028. The ``BasicBlock`` class
  3029. ------------------------
  3030. ``#include "llvm/IR/BasicBlock.h"``
  3031. header source: `BasicBlock.h
  3032. <http://llvm.org/doxygen/BasicBlock_8h_source.html>`_
  3033. doxygen info: `BasicBlock Class
  3034. <http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_
  3035. Superclass: Value_
  3036. This class represents a single entry single exit section of the code, commonly
  3037. known as a basic block by the compiler community. The ``BasicBlock`` class
  3038. maintains a list of Instruction_\ s, which form the body of the block. Matching
  3039. the language definition, the last element of this list of instructions is always
  3040. a terminator instruction.
  3041. In addition to tracking the list of instructions that make up the block, the
  3042. ``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that
  3043. it is embedded into.
  3044. Note that ``BasicBlock``\ s themselves are Value_\ s, because they are
  3045. referenced by instructions like branches and can go in the switch tables.
  3046. ``BasicBlock``\ s have type ``label``.
  3047. .. _m_BasicBlock:
  3048. Important Public Members of the ``BasicBlock`` class
  3049. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  3050. * ``BasicBlock(const std::string &Name = "", Function *Parent = 0)``
  3051. The ``BasicBlock`` constructor is used to create new basic blocks for
  3052. insertion into a function. The constructor optionally takes a name for the
  3053. new block, and a :ref:`Function <c_Function>` to insert it into. If the
  3054. ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically
  3055. inserted at the end of the specified :ref:`Function <c_Function>`, if not
  3056. specified, the BasicBlock must be manually inserted into the :ref:`Function
  3057. <c_Function>`.
  3058. * | ``BasicBlock::iterator`` - Typedef for instruction list iterator
  3059. | ``BasicBlock::const_iterator`` - Typedef for const_iterator.
  3060. | ``begin()``, ``end()``, ``front()``, ``back()``,
  3061. ``size()``, ``empty()``
  3062. STL-style functions for accessing the instruction list.
  3063. These methods and typedefs are forwarding functions that have the same
  3064. semantics as the standard library methods of the same names. These methods
  3065. expose the underlying instruction list of a basic block in a way that is easy
  3066. to manipulate. To get the full complement of container operations (including
  3067. operations to update the list), you must use the ``getInstList()`` method.
  3068. * ``BasicBlock::InstListType &getInstList()``
  3069. This method is used to get access to the underlying container that actually
  3070. holds the Instructions. This method must be used when there isn't a
  3071. forwarding function in the ``BasicBlock`` class for the operation that you
  3072. would like to perform. Because there are no forwarding functions for
  3073. "updating" operations, you need to use this if you want to update the contents
  3074. of a ``BasicBlock``.
  3075. * ``Function *getParent()``
  3076. Returns a pointer to :ref:`Function <c_Function>` the block is embedded into,
  3077. or a null pointer if it is homeless.
  3078. * ``Instruction *getTerminator()``
  3079. Returns a pointer to the terminator instruction that appears at the end of the
  3080. ``BasicBlock``. If there is no terminator instruction, or if the last
  3081. instruction in the block is not a terminator, then a null pointer is returned.
  3082. .. _Argument:
  3083. The ``Argument`` class
  3084. ----------------------
  3085. This subclass of Value defines the interface for incoming formal arguments to a
  3086. function. A Function maintains a list of its formal arguments. An argument has
  3087. a pointer to the parent Function.