|
@@ -34,9 +34,7 @@ the purpose of scheduling instructions (and therefore not described by the
|
|
|
scheduling model), but are very important for this tool.
|
|
|
|
|
|
A few examples of details that are missing in scheduling models are:
|
|
|
- - Maximum number of instructions retired per cycle.
|
|
|
- Actual dispatch width (it often differs from the issue width).
|
|
|
- - Number of temporary registers available for renaming.
|
|
|
- Number of read/write ports in the register file(s).
|
|
|
- Length of the load/store queue in the LSUnit.
|
|
|
|
|
@@ -387,17 +385,17 @@ An instruction can be dispatched if:
|
|
|
- There are enough temporary registers to do register renaming
|
|
|
- Schedulers are not full.
|
|
|
|
|
|
-Scheduling models don't describe register files, and therefore the tool doesn't
|
|
|
-know if there is more than one register file, and how many temporaries are
|
|
|
-available for register renaming.
|
|
|
+Since r329067, scheduling models can now optionally specify which register files
|
|
|
+are available on the processor. Class DispatchUnit(see Dispatch.h) would use
|
|
|
+that information to initialize register file descriptors.
|
|
|
|
|
|
-By default, the tool (optimistically) assumes a single register file with an
|
|
|
-unbounded number of temporary registers. Users can limit the number of
|
|
|
-temporary registers available for register renaming using flag
|
|
|
-`-register-file-size=<N>`, where N is the number of temporaries. A value of
|
|
|
-zero for N means 'unbounded'. Knowing how many temporaries are available for
|
|
|
-register renaming, the tool can predict dispatch stalls caused by the lack of
|
|
|
-temporaries.
|
|
|
+By default, if the model doesn't describe register files, the tool
|
|
|
+(optimistically) assumes a single register file with an unbounded number of
|
|
|
+temporary registers. Users can limit the number of temporary registers that are
|
|
|
+globally available for register renaming using flag `-register-file-size=<N>`,
|
|
|
+where N is the number of temporaries. A value of zero for N means 'unbounded'.
|
|
|
+Knowing how many temporaries are available for register renaming, the tool can
|
|
|
+predict dispatch stalls caused by the lack of temporaries.
|
|
|
|
|
|
The number of reorder buffer entries consumed by an instruction depends on the
|
|
|
number of micro-opcodes it specifies in the target scheduling model (see field
|
|
@@ -667,25 +665,6 @@ instructions are not evaluated, and therefore control flow is not affected.
|
|
|
However, the tool still queries the processor scheduling model to obtain latency
|
|
|
information for instructions that affect the control flow.
|
|
|
|
|
|
-Possible extensions to the scheduling model
|
|
|
--------------------------------------------
|
|
|
-Section "Instruction Dispatch" explained how the tool doesn't know about the
|
|
|
-register files, and temporaries available in each register file for register
|
|
|
-renaming purposes.
|
|
|
-
|
|
|
-The LLVM scheduling model could be extended to better describe register files.
|
|
|
-Ideally, scheduling model should be able to define:
|
|
|
- - The size of each register file
|
|
|
- - How many temporary registers are available for register renaming
|
|
|
- - How register classes map to register files
|
|
|
-
|
|
|
-The scheduling model doesn't specify the retire throughput (i.e. how many
|
|
|
-instructions can be retired every cycle). Users can specify flag
|
|
|
-`-max-retire-per-cycle=<uint>` to limit how many instructions the retire control
|
|
|
-unit can retire every cycle. Ideally, every processor should be able to specify
|
|
|
-the retire throughput (for example, by adding an extra field to the scheduling
|
|
|
-model tablegen class).
|
|
|
-
|
|
|
Known limitations on X86 processors
|
|
|
-----------------------------------
|
|
|
|
|
@@ -867,8 +846,6 @@ analysis.
|
|
|
Future work
|
|
|
-----------
|
|
|
* Address limitations (described in section "Known limitations").
|
|
|
- * Integrate extra description in the processor models, and make it opt-in for
|
|
|
- the targets (see section "Possible extensions to the scheduling model").
|
|
|
* Let processors specify the selection strategy for processor resource groups
|
|
|
and resources with multiple units. The tool currently uses a round-robin
|
|
|
selector to pick the next resource to use.
|
|
@@ -877,8 +854,11 @@ Future work
|
|
|
* Address design issues identified in section "Known design problems".
|
|
|
* Define a standard interface for "Views". This would let users customize the
|
|
|
performance report generated by the tool.
|
|
|
- * Simplify the Backend interface.
|
|
|
|
|
|
When interfaces are mature/stable:
|
|
|
* Move the logic into a library. This will enable a number of other
|
|
|
interesting use cases.
|
|
|
+
|
|
|
+Work is currently tracked on https://bugs.llvm.org. llvm-mca bugs are tagged
|
|
|
+with prefix [llvm-mca]. You can easily find the full list of open bugs if you
|
|
|
+search for that tag.
|