123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396 |
- Inlining
- ========
- There are several options that control which calls the analyzer will consider for
- inlining. The major one is ``-analyzer-config ipa``:
- * ``analyzer-config ipa=none`` - All inlining is disabled. This is the only mode
- available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
- * ``analyzer-config ipa=basic-inlining`` - Turns on inlining for C functions, C++
- static member functions, and blocks -- essentially, the calls that behave
- like simple C function calls. This is essentially the mode used in
- Xcode 4.4.
- * ``analyzer-config ipa=inlining`` - Turns on inlining when we can confidently find
- the function/method body corresponding to the call. (C functions, static
- functions, devirtualized C++ methods, Objective-C class methods, Objective-C
- instance methods when ExprEngine is confident about the dynamic type of the
- instance).
- * ``analyzer-config ipa=dynamic`` - Inline instance methods for which the type is
- determined at runtime and we are not 100% sure that our type info is
- correct. For virtual calls, inline the most plausible definition.
- * ``analyzer-config ipa=dynamic-bifurcate`` - Same as -analyzer-config ipa=dynamic,
- but the path is split. We inline on one branch and do not inline on the
- other. This mode does not drop the coverage in cases when the parent class
- has code that is only exercised when some of its methods are overridden.
- Currently, ``-analyzer-config ipa=dynamic-bifurcate`` is the default mode.
- While ``-analyzer-config ipa`` determines in general how aggressively the analyzer
- will try to inline functions, several additional options control which types of
- functions can inlined, in an all-or-nothing way. These options use the
- analyzer's configuration table, so they are all specified as follows:
- ``-analyzer-config OPTION=VALUE``
- c++-inlining
- ------------
- This option controls which C++ member functions may be inlined.
- ``-analyzer-config c++-inlining=[none | methods | constructors | destructors]``
- Each of these modes implies that all the previous member function kinds will be
- inlined as well; it doesn't make sense to inline destructors without inlining
- constructors, for example.
- The default c++-inlining mode is 'destructors', meaning that all member
- functions with visible definitions will be considered for inlining. In some
- cases the analyzer may still choose not to inline the function.
- Note that under 'constructors', constructors for types with non-trivial
- destructors will not be inlined. Additionally, no C++ member functions will be
- inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining,
- regardless of the setting of the c++-inlining mode.
- c++-template-inlining
- ^^^^^^^^^^^^^^^^^^^^^
- This option controls whether C++ templated functions may be inlined.
- ``-analyzer-config c++-template-inlining=[true | false]``
- Currently, template functions are considered for inlining by default.
- The motivation behind this option is that very generic code can be a source
- of false positives, either by considering paths that the caller considers
- impossible (by some unstated precondition), or by inlining some but not all
- of a deep implementation of a function.
- c++-stdlib-inlining
- ^^^^^^^^^^^^^^^^^^^
- This option controls whether functions from the C++ standard library, including
- methods of the container classes in the Standard Template Library, should be
- considered for inlining.
- ``-analyzer-config c++-stdlib-inlining=[true | false]``
- Currently, C++ standard library functions are considered for inlining by
- default.
- The standard library functions and the STL in particular are used ubiquitously
- enough that our tolerance for false positives is even lower here. A false
- positive due to poor modeling of the STL leads to a poor user experience, since
- most users would not be comfortable adding assertions to system headers in order
- to silence analyzer warnings.
- c++-container-inlining
- ^^^^^^^^^^^^^^^^^^^^^^
- This option controls whether constructors and destructors of "container" types
- should be considered for inlining.
- ``-analyzer-config c++-container-inlining=[true | false]``
- Currently, these constructors and destructors are NOT considered for inlining
- by default.
- The current implementation of this setting checks whether a type has a member
- named 'iterator' or a member named 'begin'; these names are idiomatic in C++,
- with the latter specified in the C++11 standard. The analyzer currently does a
- fairly poor job of modeling certain data structure invariants of container-like
- objects. For example, these three expressions should be equivalent:
- .. code-block:: cpp
-
- std::distance(c.begin(), c.end()) == 0
- c.begin() == c.end()
- c.empty()
- Many of these issues are avoided if containers always have unknown, symbolic
- state, which is what happens when their constructors are treated as opaque.
- In the future, we may decide specific containers are "safe" to model through
- inlining, or choose to model them directly using checkers instead.
- Basics of Implementation
- ------------------------
- The low-level mechanism of inlining a function is handled in
- ExprEngine::inlineCall and ExprEngine::processCallExit.
- If the conditions are right for inlining, a CallEnter node is created and added
- to the analysis work list. The CallEnter node marks the change to a new
- LocationContext representing the called function, and its state includes the
- contents of the new stack frame. When the CallEnter node is actually processed,
- its single successor will be a edge to the first CFG block in the function.
- Exiting an inlined function is a bit more work, fortunately broken up into
- reasonable steps:
- 1. The CoreEngine realizes we're at the end of an inlined call and generates a
- CallExitBegin node.
- 2. ExprEngine takes over (in processCallExit) and finds the return value of the
- function, if it has one. This is bound to the expression that triggered the
- call. (In the case of calls without origin expressions, such as destructors,
- this step is skipped.)
- 3. Dead symbols and bindings are cleaned out from the state, including any local
- bindings.
- 4. A CallExitEnd node is generated, which marks the transition back to the
- caller's LocationContext.
- 5. Custom post-call checks are processed and the final nodes are pushed back
- onto the work list, so that evaluation of the caller can continue.
- Retry Without Inlining
- ^^^^^^^^^^^^^^^^^^^^^^
- In some cases, we would like to retry analysis without inlining a particular
- call.
- Currently, we use this technique to recover coverage in case we stop
- analyzing a path due to exceeding the maximum block count inside an inlined
- function.
- When this situation is detected, we walk up the path to find the first node
- before inlining was started and enqueue it on the WorkList with a special
- ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining). The
- path is then re-analyzed from that point without inlining that particular call.
- Deciding When to Inline
- ^^^^^^^^^^^^^^^^^^^^^^^
- In general, the analyzer attempts to inline as much as possible, since it
- provides a better summary of what actually happens in the program. There are
- some cases, however, where the analyzer chooses not to inline:
- - If there is no definition available for the called function or method. In
- this case, there is no opportunity to inline.
- - If the CFG cannot be constructed for a called function, or the liveness
- cannot be computed. These are prerequisites for analyzing a function body,
- with or without inlining.
- - If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
- depth. This prevents unbounded analysis due to infinite recursion, but also
- serves as a useful cutoff for performance reasons.
- - If the function is variadic. This is not a hard limitation, but an engineering
- limitation.
- Tracked by: <rdar://problem/12147064> Support inlining of variadic functions
- - In C++, constructors are not inlined unless the destructor call will be
- processed by the ExprEngine. Thus, if the CFG was built without nodes for
- implicit destructors, or if the destructors for the given object are not
- represented in the CFG, the constructor will not be inlined. (As an exception,
- constructors for objects with trivial constructors can still be inlined.)
- See "C++ Caveats" below.
- - In C++, ExprEngine does not inline custom implementations of operator 'new'
- or operator 'delete', nor does it inline the constructors and destructors
- associated with these. See "C++ Caveats" below.
- - Calls resulting in "dynamic dispatch" are specially handled. See more below.
- - The FunctionSummaries map stores additional information about declarations,
- some of which is collected at runtime based on previous analyses.
- We do not inline functions which were not profitable to inline in a different
- context (for example, if the maximum block count was exceeded; see
- "Retry Without Inlining").
- Dynamic Calls and Devirtualization
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- "Dynamic" calls are those that are resolved at runtime, such as C++ virtual
- method calls and Objective-C message sends. Due to the path-sensitive nature of
- the analysis, the analyzer may be able to reason about the dynamic type of the
- object whose method is being called and thus "devirtualize" the call.
- This path-sensitive devirtualization occurs when the analyzer can determine what
- method would actually be called at runtime. This is possible when the type
- information is constrained enough for a simulated C++/Objective-C object that
- the analyzer can make such a decision.
- DynamicTypeInfo
- ^^^^^^^^^^^^^^^
- As the analyzer analyzes a path, it may accrue information to refine the
- knowledge about the type of an object. This can then be used to make better
- decisions about the target method of a call.
- Such type information is tracked as DynamicTypeInfo. This is path-sensitive
- data that is stored in ProgramState, which defines a mapping from MemRegions to
- an (optional) DynamicTypeInfo.
- If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
- inferred from the region's type or associated symbol. Information from symbolic
- regions is weaker than from true typed regions.
- EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
- reference "A &ref" may dynamically be a subclass of 'A'.
- The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
- updating it as information is observed along a path that can refine that type
- information for a region.
- WARNING: Not all of the existing analyzer code has been retrofitted to use
- DynamicTypeInfo, nor is it universally appropriate. In particular,
- DynamicTypeInfo always applies to a region with all casts stripped
- off, but sometimes the information provided by casts can be useful.
- RuntimeDefinition
- ^^^^^^^^^^^^^^^^^
- The basis of devirtualization is CallEvent's getRuntimeDefinition() method,
- which returns a RuntimeDefinition object. When asked to provide a definition,
- the CallEvents for dynamic calls will use the DynamicTypeInfo in their
- ProgramState to attempt to devirtualize the call. In the case of no dynamic
- dispatch, or perfectly constrained devirtualization, the resulting
- RuntimeDefinition contains a Decl corresponding to the definition of the called
- function, and RuntimeDefinition::mayHaveOtherDefinitions will return FALSE.
- In the case of dynamic dispatch where our information is not perfect, CallEvent
- can make a guess, but RuntimeDefinition::mayHaveOtherDefinitions will return
- TRUE. The RuntimeDefinition object will then also include a MemRegion
- corresponding to the object being called (i.e., the "receiver" in Objective-C
- parlance), which ExprEngine uses to decide whether or not the call should be
- inlined.
- Inlining Dynamic Calls
- ^^^^^^^^^^^^^^^^^^^^^^
- The -analyzer-config ipa option has five different modes: none, basic-inlining,
- inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic,
- all dynamic calls are inlined, whether we are certain or not that this will
- actually be the definition used at runtime. Under -analyzer-config ipa=inlining,
- only "near-perfect" devirtualized calls are inlined*, and other dynamic calls
- are evaluated conservatively (as if no definition were available).
- * Currently, no Objective-C messages are not inlined under
- -analyzer-config ipa=inlining, even if we are reasonably confident of the type
- of the receiver. We plan to enable this once we have tested our heuristics
- more thoroughly.
- The last option, -analyzer-config ipa=dynamic-bifurcate, behaves similarly to
- "dynamic", but performs a conservative invalidation in the general virtual case
- in *addition* to inlining. The details of this are discussed below.
- As stated above, -analyzer-config ipa=basic-inlining does not inline any C++
- member functions or Objective-C method calls, even if they are non-virtual or
- can be safely devirtualized.
- Bifurcation
- ^^^^^^^^^^^
- ExprEngine::BifurcateCall implements the ``-analyzer-config ipa=dynamic-bifurcate``
- mode.
- When a call is made on an object with imprecise dynamic type information
- (RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
- bifurcates the path and marks the object's region (retrieved from the
- RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState.
- Currently, there are 2 modes:
- * ``DynamicDispatchModeInlined`` - Models the case where the dynamic type information
- of the receiver (MemoryRegion) is assumed to be perfectly constrained so
- that a given definition of a method is expected to be the code actually
- called. When this mode is set, ExprEngine uses the Decl from
- RuntimeDefinition to inline any dynamically dispatched call sent to this
- receiver because the function definition is considered to be fully resolved.
- * ``DynamicDispatchModeConservative`` - Models the case where the dynamic type
- information is assumed to be incorrect, for example, implies that the method
- definition is overridden in a subclass. In such cases, ExprEngine does not
- inline the methods sent to the receiver (MemoryRegion), even if a candidate
- definition is available. This mode is conservative about simulating the
- effects of a call.
- Going forward along the symbolic execution path, ExprEngine consults the mode
- of the receiver's MemRegion to make decisions on whether the calls should be
- inlined or not, which ensures that there is at most one split per region.
- At a high level, "bifurcation mode" allows for increased semantic coverage in
- cases where the parent method contains code which is only executed when the
- class is subclassed. The disadvantages of this mode are a (considerable?)
- performance hit and the possibility of false positives on the path where the
- conservative mode is used.
- Objective-C Message Heuristics
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ExprEngine relies on a set of heuristics to partition the set of Objective-C
- method calls into those that require bifurcation and those that do not. Below
- are the cases when the DynamicTypeInfo of the object is considered precise
- (cannot be a subclass):
- - If the object was created with +alloc or +new and initialized with an -init
- method.
- - If the calls are property accesses using dot syntax. This is based on the
- assumption that children rarely override properties, or do so in an
- essentially compatible way.
- - If the class interface is declared inside the main source file. In this case
- it is unlikely that it will be subclassed.
- - If the method is not declared outside of main source file, either by the
- receiver's class or by any superclasses.
- C++ Caveats
- ^^^^^^^^^^^
- C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
- being constructed or destructed; that is, the type of the object depends on
- which base constructors have been completed. This is tracked using
- DynamicTypeInfo in the DynamicTypePropagation checker.
- There are several limitations in the current implementation:
- * Temporaries are poorly modeled right now because we're not confident in the
- placement of their destructors in the CFG. We currently won't inline their
- constructors unless the destructor is trivial, and don't process their
- destructors at all, not even to invalidate the region.
- * 'new' is poorly modeled due to some nasty CFG/design issues. This is tracked
- in PR12014. 'delete' is not modeled at all.
- * Arrays of objects are modeled very poorly right now. ExprEngine currently
- only simulates the first constructor and first destructor. Because of this,
- ExprEngine does not inline any constructors or destructors for arrays.
- CallEvent
- ^^^^^^^^^
- A CallEvent represents a specific call to a function, method, or other body of
- code. It is path-sensitive, containing both the current state (ProgramStateRef)
- and stack space (LocationContext), and provides uniform access to the argument
- values and return type of a call, no matter how the call is written in the
- source or what sort of code body is being invoked.
- NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
- NSInvocation.
- CallEvent should be used whenever there is logic dealing with function calls
- that does not care how the call occurred.
- Examples include checking that arguments satisfy preconditions (such as
- __attribute__((nonnull))), and attempting to inline a call.
- CallEvents are reference-counted objects managed by a CallEventManager. While
- there is no inherent issue with persisting them (say, in a ProgramState's GDM),
- they are intended for short-lived use, and can be recreated from CFGElements or
- non-top-level StackFrameContexts fairly easily.
|