|
@@ -0,0 +1,441 @@
|
|
|
|
+.. _transformation-metadata:
|
|
|
|
+
|
|
|
|
+============================
|
|
|
|
+Code Transformation Metadata
|
|
|
|
+============================
|
|
|
|
+
|
|
|
|
+.. contents::
|
|
|
|
+ :local:
|
|
|
|
+
|
|
|
|
+Overview
|
|
|
|
+========
|
|
|
|
+
|
|
|
|
+LLVM transformation passes can be controlled by attaching metadata to
|
|
|
|
+the code to transform. By default, transformation passes use heuristics
|
|
|
|
+to determine whether or not to perform transformations, and when doing
|
|
|
|
+so, other details of how the transformations are applied (e.g., which
|
|
|
|
+vectorization factor to select).
|
|
|
|
+Unless the optimizer is otherwise directed, transformations are applied
|
|
|
|
+conservatively. This conservatism generally allows the optimizer to
|
|
|
|
+avoid unprofitable transformations, but in practice, this results in the
|
|
|
|
+optimizer not applying transformations that would be highly profitable.
|
|
|
|
+
|
|
|
|
+Frontends can give additional hints to LLVM passes on which
|
|
|
|
+transformations they should apply. This can be additional knowledge that
|
|
|
|
+cannot be derived from the emitted IR, or directives passed from the
|
|
|
|
+user/programmer. OpenMP pragmas are an example of the latter.
|
|
|
|
+
|
|
|
|
+If any such metadata is dropped from the program, the code's semantics
|
|
|
|
+must not change.
|
|
|
|
+
|
|
|
|
+Metadata on Loops
|
|
|
|
+=================
|
|
|
|
+
|
|
|
|
+Attributes can be attached to loops as described in :ref:`llvm.loop`.
|
|
|
|
+Attributes can describe properties of the loop, disable transformations,
|
|
|
|
+force specific transformations and set transformation options.
|
|
|
|
+
|
|
|
|
+Because metadata nodes are immutable (with the exception of
|
|
|
|
+``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
|
|
|
|
+metadata), in order to add or remove a loop attributes, a new ``MDNode``
|
|
|
|
+must be created and assigned as the new ``llvm.loop`` metadata. Any
|
|
|
|
+connection between the old ``MDNode`` and the loop is lost. The
|
|
|
|
+``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
|
|
|
|
+the loop effectively gets a new identifier. For instance,
|
|
|
|
+``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
|
|
|
|
+the parallel access property is to be preserved after adding/removing
|
|
|
|
+loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
|
|
|
|
+updated to the new LoopID.
|
|
|
|
+
|
|
|
|
+Transformation Metadata Structure
|
|
|
|
+=================================
|
|
|
|
+
|
|
|
|
+Some attributes describe code transformations (unrolling, vectorizing,
|
|
|
|
+loop distribution, etc.). They can either be a hint to the optimizer
|
|
|
|
+that a transformation might be beneficial, instruction to use a specific
|
|
|
|
+option, , or convey a specific request from the user (such as
|
|
|
|
+``#pragma clang loop`` or ``#pragma omp simd``).
|
|
|
|
+
|
|
|
|
+If a transformation is forced but cannot be carried-out for any reason,
|
|
|
|
+an optimization-missed warning must be emitted. Semantic information
|
|
|
|
+such as a transformation being safe (e.g.
|
|
|
|
+``llvm.mem.parallel_loop_access``) can be unused by the optimizer
|
|
|
|
+without generating a warning.
|
|
|
|
+
|
|
|
|
+Unless explicitly disabled, any optimization pass may heuristically
|
|
|
|
+determine whether a transformation is beneficial and apply it. If
|
|
|
|
+metadata for another transformation was specified, applying a different
|
|
|
|
+transformation before it might be inadvertent due to being applied on a
|
|
|
|
+different loop or the loop not existing anymore. To avoid having to
|
|
|
|
+explicitly disable an unknown number of passes, the attribute
|
|
|
|
+``llvm.loop.disable_nonforced`` disables all optional, high-level,
|
|
|
|
+restructuring transformations.
|
|
|
|
+
|
|
|
|
+The following example avoids the loop being altered before being
|
|
|
|
+vectorized, for instance being unrolled.
|
|
|
|
+
|
|
|
|
+.. code-block:: llvm
|
|
|
|
+
|
|
|
|
+ br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
|
|
|
|
+ ...
|
|
|
|
+ !0 = distinct !{!0, !1, !2}
|
|
|
|
+ !1 = !{!"llvm.loop.vectorize.enable", i1 true}
|
|
|
|
+ !2 = !{!"llvm.loop.disable_nonforced"}
|
|
|
|
+
|
|
|
|
+After a transformation is applied, follow-up attributes are set on the
|
|
|
|
+transformed and/or new loop(s). This allows additional attributes
|
|
|
|
+including followup-transformations to be specified. Specifying multiple
|
|
|
|
+transformations in the same metadata node is possible for compatibility
|
|
|
|
+reasons, but their execution order is undefined. For instance, when
|
|
|
|
+``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
|
|
|
|
+specified at the same time, unrolling may occur either before or after
|
|
|
|
+vectorization.
|
|
|
|
+
|
|
|
|
+As an example, the following instructs a loop to be vectorized and only
|
|
|
|
+then unrolled.
|
|
|
|
+
|
|
|
|
+.. code-block:: llvm
|
|
|
|
+
|
|
|
|
+ !0 = distinct !{!0, !1, !2, !3}
|
|
|
|
+ !1 = !{!"llvm.loop.vectorize.enable", i1 true}
|
|
|
|
+ !2 = !{!"llvm.loop.disable_nonforced"}
|
|
|
|
+ !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}
|
|
|
|
+
|
|
|
|
+If, and only if, no followup is specified, the pass may add attributes itself.
|
|
|
|
+For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
|
|
|
|
+all attributes from the original loop excluding its loop vectorizer
|
|
|
|
+attributes. To avoid this, an empty followup attribute can be used, e.g.
|
|
|
|
+
|
|
|
|
+.. code-block:: llvm
|
|
|
|
+
|
|
|
|
+ !3 = !{!"llvm.loop.vectorize.followup_vectorized"}
|
|
|
|
+
|
|
|
|
+The followup attributes of a transformation that cannot be applied will
|
|
|
|
+never be added to a loop and are therefore effectively ignored. This means
|
|
|
|
+that any followup-transformation in such attributes requires that its
|
|
|
|
+prior transformations are applied before the followup-transformation.
|
|
|
|
+The user should receive a warning about the first transformation in the
|
|
|
|
+transformation chain that could not be applied if it a forced
|
|
|
|
+transformation. All following transformations are skipped.
|
|
|
|
+
|
|
|
|
+Pass-Specific Transformation Metadata
|
|
|
|
+=====================================
|
|
|
|
+
|
|
|
|
+Transformation options are specific to each transformation. In the
|
|
|
|
+following, we present the model for each LLVM loop optimization pass and
|
|
|
|
+the metadata to influence them.
|
|
|
|
+
|
|
|
|
+Loop Vectorization and Interleaving
|
|
|
|
+-----------------------------------
|
|
|
|
+
|
|
|
|
+Loop vectorization and interleaving is interpreted as a single
|
|
|
|
+transformation. It is interpreted as forced if
|
|
|
|
+``!{"llvm.loop.vectorize.enable", i1 true}`` is set.
|
|
|
|
+
|
|
|
|
+Assuming the pre-vectorization loop is
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ for (int i = 0; i < n; i+=1) // original loop
|
|
|
|
+ Stmt(i);
|
|
|
|
+
|
|
|
|
+then the code after vectorization will be approximately (assuming an
|
|
|
|
+SIMD width of 4):
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ int i = 0;
|
|
|
|
+ if (rtc) {
|
|
|
|
+ for (; i + 3 < n; i+=4) // vectorized/interleaved loop
|
|
|
|
+ Stmt(i:i+3);
|
|
|
|
+ }
|
|
|
|
+ for (; i < n; i+=1) // epilogue loop
|
|
|
|
+ Stmt(i);
|
|
|
|
+
|
|
|
|
+where ``rtc`` is a generated runtime check.
|
|
|
|
+
|
|
|
|
+``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
|
|
|
|
+the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
|
|
|
|
+combined with the original loop's attributes to avoid it being
|
|
|
|
+vectorized multiple times.
|
|
|
|
+
|
|
|
|
+``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
|
|
|
|
+the remainder loop. If not specified, it will have the original loop's
|
|
|
|
+attributes combined with ``llvm.loop.isvectorized`` and
|
|
|
|
+``llvm.loop.unroll.runtime.disable`` (unless the original loop already
|
|
|
|
+has unroll metadata).
|
|
|
|
+
|
|
|
|
+The attributes specified by ``llvm.loop.vectorize.followup_all`` are
|
|
|
|
+added to both loops.
|
|
|
|
+
|
|
|
|
+When using a follow-up attribute, it replaces any automatically deduced
|
|
|
|
+attributes for the generated loop in question. Therefore it is
|
|
|
|
+recommended to add ``llvm.loop.isvectorized`` to
|
|
|
|
+``llvm.loop.vectorize.followup_all`` which avoids that the loop
|
|
|
|
+vectorizer tries to optimize the loops again.
|
|
|
|
+
|
|
|
|
+Loop Unrolling
|
|
|
|
+--------------
|
|
|
|
+
|
|
|
|
+Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
|
|
|
|
+metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
|
|
|
|
+is present. Unrolling can be full unrolling, partial unrolling of a loop
|
|
|
|
+with constant trip count or runtime unrolling of a loop with a trip
|
|
|
|
+count unknown at compile-time.
|
|
|
|
+
|
|
|
|
+If the loop has been unrolled fully, there is no followup-loop. For
|
|
|
|
+partial/runtime unrolling, the original loop of
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ for (int i = 0; i < n; i+=1) // original loop
|
|
|
|
+ Stmt(i);
|
|
|
|
+
|
|
|
|
+is transformed into (using an unroll factor of 4):
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ int i = 0;
|
|
|
|
+ for (; i + 3 < n; i+=4) // unrolled loop
|
|
|
|
+ Stmt(i);
|
|
|
|
+ Stmt(i+1);
|
|
|
|
+ Stmt(i+2);
|
|
|
|
+ Stmt(i+3);
|
|
|
|
+ }
|
|
|
|
+ for (; i < n; i+=1) // remainder loop
|
|
|
|
+ Stmt(i);
|
|
|
|
+
|
|
|
|
+``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
|
|
|
|
+the unrolled loop. If not specified, the attributes of the original loop
|
|
|
|
+without the ``llvm.loop.unroll.*`` attributes are copied and
|
|
|
|
+``llvm.loop.unroll.disable`` added to it.
|
|
|
|
+
|
|
|
|
+``llvm.loop.unroll.followup_remainder`` defines the attributes of the
|
|
|
|
+remainder loop. If not specified the remainder loop will have no
|
|
|
|
+attributes. The remainder loop might not be present due to being fully
|
|
|
|
+unrolled in which case this attribute has no effect.
|
|
|
|
+
|
|
|
|
+Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
|
|
|
|
+unrolled and remainder loops.
|
|
|
|
+
|
|
|
|
+To avoid that the partially unrolled loop is unrolled again, it is
|
|
|
|
+recommended to add ``llvm.loop.unroll.disable`` to
|
|
|
|
+``llvm.loop.unroll.followup_all``. If no follow-up attribute specified
|
|
|
|
+for a generated loop, it is added automatically.
|
|
|
|
+
|
|
|
|
+Unroll-And-Jam
|
|
|
|
+--------------
|
|
|
|
+
|
|
|
|
+Unroll-and-jam uses the following transformation model (here with an
|
|
|
|
+unroll factor if 2). Currently, it does not support a fallback version
|
|
|
|
+when the transformation is unsafe.
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ for (int i = 0; i < n; i+=1) { // original outer loop
|
|
|
|
+ Fore(i);
|
|
|
|
+ for (int j = 0; j < m; j+=1) // original inner loop
|
|
|
|
+ SubLoop(i, j);
|
|
|
|
+ Aft(i);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ int i = 0;
|
|
|
|
+ for (; i + 1 < n; i+=2) { // unrolled outer loop
|
|
|
|
+ Fore(i);
|
|
|
|
+ Fore(i+1);
|
|
|
|
+ for (int j = 0; j < m; j+=1) { // unrolled inner loop
|
|
|
|
+ SubLoop(i, j);
|
|
|
|
+ SubLoop(i+1, j);
|
|
|
|
+ }
|
|
|
|
+ Aft(i);
|
|
|
|
+ Aft(i+1);
|
|
|
|
+ }
|
|
|
|
+ for (; i < n; i+=1) { // remainder outer loop
|
|
|
|
+ Fore(i);
|
|
|
|
+ for (int j = 0; j < m; j+=1) // remainder inner loop
|
|
|
|
+ SubLoop(i, j);
|
|
|
|
+ Aft(i);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
|
|
|
|
+of the unrolled outer loop. If not specified, the attributes of the
|
|
|
|
+original outer loop without the ``llvm.loop.unroll.*`` attributes are
|
|
|
|
+copied and ``llvm.loop.unroll.disable`` added to it.
|
|
|
|
+
|
|
|
|
+``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
|
|
|
|
+of the unrolled inner loop. If not specified, the attributes of the
|
|
|
|
+original inner loop are used unchanged.
|
|
|
|
+
|
|
|
|
+``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
|
|
|
|
+attributes of the outer remainder loop. If not specified it will not
|
|
|
|
+have any attributes. The remainder loop might not be present due to
|
|
|
|
+being fully unrolled.
|
|
|
|
+
|
|
|
|
+``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
|
|
|
|
+attributes of the inner remainder loop. If not specified it will have
|
|
|
|
+the attributes of the original inner loop. It the outer remainder loop
|
|
|
|
+is unrolled, the inner remainder loop might be present multiple times.
|
|
|
|
+
|
|
|
|
+Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
|
|
|
|
+added to all of the aforementioned output loops.
|
|
|
|
+
|
|
|
|
+To avoid that the unrolled loop is unrolled again, it is
|
|
|
|
+recommended to add ``llvm.loop.unroll.disable`` to
|
|
|
|
+``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam
|
|
|
|
+as well as an additional inner loop unrolling. If no follow-up
|
|
|
|
+attribute specified for a generated loop, it is added automatically.
|
|
|
|
+
|
|
|
|
+Loop Distribution
|
|
|
|
+-----------------
|
|
|
|
+
|
|
|
|
+The LoopDistribution pass tries to separate vectorizable parts of a loop
|
|
|
|
+from the non-vectorizable part (which otherwise would make the entire
|
|
|
|
+loop non-vectorizable). Conceptually, it transforms a loop such as
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ for (int i = 1; i < n; i+=1) { // original loop
|
|
|
|
+ A[i] = i;
|
|
|
|
+ B[i] = 2 + B[i];
|
|
|
|
+ C[i] = 3 + C[i - 1];
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+into the following code:
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ if (rtc) {
|
|
|
|
+ for (int i = 1; i < n; i+=1) // coincident loop
|
|
|
|
+ A[i] = i;
|
|
|
|
+ for (int i = 1; i < n; i+=1) // coincident loop
|
|
|
|
+ B[i] = 2 + B[i];
|
|
|
|
+ for (int i = 1; i < n; i+=1) // sequential loop
|
|
|
|
+ C[i] = 3 + C[i - 1];
|
|
|
|
+ } else {
|
|
|
|
+ for (int i = 1; i < n; i+=1) { // fallback loop
|
|
|
|
+ A[i] = i;
|
|
|
|
+ B[i] = 2 + B[i];
|
|
|
|
+ C[i] = 3 + C[i - 1];
|
|
|
|
+ }
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+where ``rtc`` is a generated runtime check.
|
|
|
|
+
|
|
|
|
+``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
|
|
|
|
+all loops without loop-carried dependencies (i.e. vectorizable loops).
|
|
|
|
+There might be more than one such loops. If not defined, the loops will
|
|
|
|
+inherit the original loop's attributes.
|
|
|
|
+
|
|
|
|
+``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
|
|
|
|
+loop with potentially unsafe dependencies. There should be at most one
|
|
|
|
+such loop. If not defined, the loop will inherit the original loop's
|
|
|
|
+attributes.
|
|
|
|
+
|
|
|
|
+``llvm.loop.distribute.followup_fallback`` defines the loop attributes
|
|
|
|
+for the fallback loop, which is a copy of the original loop for when
|
|
|
|
+loop versioning is required. If undefined, the fallback loop inherits
|
|
|
|
+all attributes from the original loop.
|
|
|
|
+
|
|
|
|
+Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
|
|
|
|
+all of the aforementioned output loops.
|
|
|
|
+
|
|
|
|
+It is recommended to add ``llvm.loop.disable_nonforced`` to
|
|
|
|
+``llvm.loop.distribute.followup_fallback``. This avoids that the
|
|
|
|
+fallback version (which is likely never executed) is further optimzed
|
|
|
|
+which would increase the code size.
|
|
|
|
+
|
|
|
|
+Versioning LICM
|
|
|
|
+---------------
|
|
|
|
+
|
|
|
|
+The pass hoists code out of loops that are only loop-invariant when
|
|
|
|
+dynamic conditions apply. For instance, it transforms the loop
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ for (int i = 0; i < n; i+=1) // original loop
|
|
|
|
+ A[i] = B[0];
|
|
|
|
+
|
|
|
|
+into:
|
|
|
|
+
|
|
|
|
+.. code-block:: c
|
|
|
|
+
|
|
|
|
+ if (rtc) {
|
|
|
|
+ auto b = B[0];
|
|
|
|
+ for (int i = 0; i < n; i+=1) // versioned loop
|
|
|
|
+ A[i] = b;
|
|
|
|
+ } else {
|
|
|
|
+ for (int i = 0; i < n; i+=1) // unversioned loop
|
|
|
|
+ A[i] = B[0];
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+The runtime condition (``rtc``) checks that the array ``A`` and the
|
|
|
|
+element `B[0]` do not alias.
|
|
|
|
+
|
|
|
|
+Currently, this transformation does not support followup-attributes.
|
|
|
|
+
|
|
|
|
+Loop Interchange
|
|
|
|
+----------------
|
|
|
|
+
|
|
|
|
+Currently, the ``LoopInterchange`` pass does not use any metadata.
|
|
|
|
+
|
|
|
|
+Ambiguous Transformation Order
|
|
|
|
+==============================
|
|
|
|
+
|
|
|
|
+If there multiple transformations defined, the order in which they are
|
|
|
|
+executed depends on the order in LLVM's pass pipeline, which is subject
|
|
|
|
+to change. The default optimization pipeline (anything higher than
|
|
|
|
+``-O0``) has the following order.
|
|
|
|
+
|
|
|
|
+When using the legacy pass manager:
|
|
|
|
+
|
|
|
|
+ - LoopInterchange (if enabled)
|
|
|
|
+ - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
|
|
|
|
+ - VersioningLICM (if enabled)
|
|
|
|
+ - LoopDistribute
|
|
|
|
+ - LoopVectorizer
|
|
|
|
+ - LoopUnrollAndJam (if enabled)
|
|
|
|
+ - LoopUnroll (partial and runtime unrolling)
|
|
|
|
+
|
|
|
|
+When using the legacy pass manager with LTO:
|
|
|
|
+
|
|
|
|
+ - LoopInterchange (if enabled)
|
|
|
|
+ - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
|
|
|
|
+ - LoopVectorizer
|
|
|
|
+ - LoopUnroll (partial and runtime unrolling)
|
|
|
|
+
|
|
|
|
+When using the new pass manager:
|
|
|
|
+
|
|
|
|
+ - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
|
|
|
|
+ - LoopDistribute
|
|
|
|
+ - LoopVectorizer
|
|
|
|
+ - LoopUnrollAndJam (if enabled)
|
|
|
|
+ - LoopUnroll (partial and runtime unrolling)
|
|
|
|
+
|
|
|
|
+Leftover Transformations
|
|
|
|
+========================
|
|
|
|
+
|
|
|
|
+Forced transformations that have not been applied after the last
|
|
|
|
+transformation pass should be reported to the user. The transformation
|
|
|
|
+passes themselves cannot be responsible for this reporting because they
|
|
|
|
+might not be in the pipeline, there might be multiple passes able to
|
|
|
|
+apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
|
|
|
|
+transformation attribute may be 'hidden' inside another passes' followup
|
|
|
|
+attribute.
|
|
|
|
+
|
|
|
|
+The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
|
|
|
|
+emits such warnings. It should be placed after the last transformation
|
|
|
|
+pass.
|
|
|
|
+
|
|
|
|
+The current pass pipeline has a fixed order in which transformations
|
|
|
|
+passes are executed. A transformation can be in the followup of a pass
|
|
|
|
+that is executed later and thus leftover. For instance, a loop nest
|
|
|
|
+cannot be distributed and then interchanged with the current pass
|
|
|
|
+pipeline. The loop distribution will execute, but there is no loop
|
|
|
|
+interchange pass following such that any loop interchange metadata will
|
|
|
|
+be ignored. The ``-transform-warning`` should emit a warning in this
|
|
|
|
+case.
|
|
|
|
+
|
|
|
|
+Future versions of LLVM may fix this by executing transformations using
|
|
|
|
+a dynamic ordering.
|