RegionStore.rst 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
  1. ============
  2. Region Store
  3. ============
  4. The analyzer "Store" represents the contents of memory regions. It is an opaque
  5. functional data structure stored in each ``ProgramState``; the only class that
  6. can modify the store is its associated StoreManager.
  7. Currently (Feb. 2013), the only StoreManager implementation being used is
  8. ``RegionStoreManager``. This store records bindings to memory regions using a
  9. "base region + offset" key. (This allows ``*p`` and ``p[0]`` to map to the same
  10. location, among other benefits.)
  11. Regions are grouped into "clusters", which roughly correspond to "regions with
  12. the same base region". This allows certain operations to be more efficient,
  13. such as invalidation.
  14. Regions that do not have a known offset use a special "symbolic" offset. These
  15. keys store both the original region, and the "concrete offset region" -- the
  16. last region whose offset is entirely concrete. (For example, in the expression
  17. ``foo.bar[1][i].baz``, the concrete offset region is the array ``foo.bar[1]``,
  18. since that has a known offset from the start of the top-level ``foo`` struct.)
  19. Binding Invalidation
  20. --------------------
  21. Supporting both concrete and symbolic offsets makes things a bit tricky. Here's
  22. an example:
  23. .. code-block:: cpp
  24. foo[0] = 0;
  25. foo[1] = 1;
  26. foo[i] = i;
  27. After the third assignment, nothing can be said about the value of ``foo[0]``,
  28. because ``foo[i]`` may have overwritten it! Thus, *binding to a region with a
  29. symbolic offset invalidates the entire concrete offset region.* We know
  30. ``foo[i]`` is somewhere within ``foo``, so we don't have to invalidate
  31. anything else, but we do have to be conservative about all other bindings within
  32. ``foo``.
  33. Continuing the example:
  34. .. code-block:: cpp
  35. foo[i] = i;
  36. foo[0] = 0;
  37. After this latest assignment, nothing can be said about the value of ``foo[i]``,
  38. because ``foo[0]`` may have overwritten it! *Binding to a region R with a
  39. concrete offset invalidates any symbolic offset bindings whose concrete offset
  40. region is a super-region **or** sub-region of R.* All we know about ``foo[i]``
  41. is that it is somewhere within ``foo``, so changing *anything* within ``foo``
  42. might change ``foo[i]``, and changing *all* of ``foo`` (or its base region) will
  43. *definitely* change ``foo[i]``.
  44. This logic could be improved by using the current constraints on ``i``, at the
  45. cost of speed. The latter case could also be improved by matching region kinds,
  46. i.e. changing ``foo[0].a`` is unlikely to affect ``foo[i].b``, no matter what
  47. ``i`` is.
  48. For more detail, read through ``RegionStoreManager::removeSubRegionBindings`` in
  49. RegionStore.cpp.
  50. ObjCIvarRegions
  51. ---------------
  52. Objective-C instance variables require a bit of special handling. Like struct
  53. fields, they are not base regions, and when their parent object region is
  54. invalidated, all the instance variables must be invalidated as well. However,
  55. they have no concrete compile-time offsets (in the modern, "non-fragile"
  56. runtime), and so cannot easily be represented as an offset from the start of
  57. the object in the analyzer. Moreover, this means that invalidating a single
  58. instance variable should *not* invalidate the rest of the object, since unlike
  59. struct fields or array elements there is no way to perform pointer arithmetic
  60. to access another instance variable.
  61. Consequently, although the base region of an ObjCIvarRegion is the entire
  62. object, RegionStore offsets are computed from the start of the instance
  63. variable. Thus it is not valid to assume that all bindings with non-symbolic
  64. offsets start from the base region!
  65. Region Invalidation
  66. -------------------
  67. Unlike binding invalidation, region invalidation occurs when the entire
  68. contents of a region may have changed---say, because it has been passed to a
  69. function the analyzer can model, like memcpy, or because its address has
  70. escaped, usually as an argument to an opaque function call. In these cases we
  71. need to throw away not just all bindings within the region itself, but within
  72. its entire cluster, since neighboring regions may be accessed via pointer
  73. arithmetic.
  74. Region invalidation typically does even more than this, however. Because it
  75. usually represents the complete escape of a region from the analyzer's model,
  76. its *contents* must also be transitively invalidated. (For example, if a region
  77. ``p`` of type ``int **`` is invalidated, the contents of ``*p`` and ``**p`` may
  78. have changed as well.) The algorithm that traverses this transitive closure of
  79. accessible regions is known as ClusterAnalysis, and is also used for finding
  80. all live bindings in the store (in order to throw away the dead ones). The name
  81. "ClusterAnalysis" predates the cluster-based organization of bindings, but
  82. refers to the same concept: during invalidation and liveness analysis, all
  83. bindings within a cluster must be treated in the same way for a conservative
  84. model of program behavior.
  85. Default Bindings
  86. ----------------
  87. Most bindings in RegionStore are simple scalar values -- integers and pointers.
  88. These are known as "Direct" bindings. However, RegionStore supports a second
  89. type of binding called a "Default" binding. These are used to provide values to
  90. all the elements of an aggregate type (struct or array) without having to
  91. explicitly specify a binding for each individual element.
  92. When there is no Direct binding for a particular region, the store manager
  93. looks at each super-region in turn to see if there is a Default binding. If so,
  94. this value is used as the value of the original region. The search ends when
  95. the base region is reached, at which point the RegionStore will pick an
  96. appropriate default value for the region (usually a symbolic value, but
  97. sometimes zero, for static data, or "uninitialized", for stack variables).
  98. .. code-block:: cpp
  99. int manyInts[10];
  100. manyInts[1] = 42; // Creates a Direct binding for manyInts[1].
  101. print(manyInts[1]); // Retrieves the Direct binding for manyInts[1];
  102. print(manyInts[0]); // There is no Direct binding for manyInts[0].
  103. // Is there a Default binding for the entire array?
  104. // There is not, but it is a stack variable, so we use
  105. // "uninitialized" as the default value (and emit a
  106. // diagnostic!).
  107. NOTE: The fact that bindings are stored as a base region plus an offset limits
  108. the Default Binding strategy, because in C aggregates can contain other
  109. aggregates. In the current implementation of RegionStore, there is no way to
  110. distinguish a Default binding for an entire aggregate from a Default binding
  111. for the sub-aggregate at offset 0.
  112. Lazy Bindings (LazyCompoundVal)
  113. -------------------------------
  114. RegionStore implements an optimization for copying aggregates (structs and
  115. arrays) called "lazy bindings", implemented using a special SVal called
  116. LazyCompoundVal. When the store is asked for the "binding" for an entire
  117. aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a
  118. LazyCompoundVal instead. When this value is then stored into a variable, it is
  119. bound as a Default value. This makes copying arrays and structs much cheaper
  120. than if they had required memberwise access.
  121. Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region,
  122. store), representing "the value of the region during this 'snapshot' of the
  123. store". This has important implications for any sort of liveness or
  124. reachability analysis, which must take the bindings in the old store into
  125. account.
  126. Retrieving a value from a lazy binding happens in the same way as any other
  127. Default binding: since there is no direct binding, the store manager falls back
  128. to super-regions to look for an appropriate default binding. LazyCompoundVal
  129. differs from a normal default binding, however, in that it contains several
  130. different values, instead of one value that will appear several times. Because
  131. of this, the store manager has to reconstruct the subregion chain on top of the
  132. LazyCompoundVal region, and look up *that* region in the previous store.
  133. Here's a concrete example:
  134. .. code-block:: cpp
  135. CGPoint p;
  136. p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'.
  137. CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a
  138. // snapshot of the current store state. This value is then
  139. // used as a Default binding for the VarRegion 'p2'.
  140. return p2.x; // The binding for FieldRegion 'p2.x' is requested.
  141. // There is no Direct binding, so we look for a Default
  142. // binding to 'p2' and find the LCV.
  143. // Because it's a LCV, we look at our requested region
  144. // and see that it's the '.x' field. We ask for the value
  145. // of 'p.x' within the snapshot, and get back 42.