Эх сурвалжийг харах

AMDGPU/SI: Insert wait states required after v_readfirstlane on SI

Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266105 91177308-0d34-0410-b5e6-96231b3b80d8
Tom Stellard 9 жил өмнө
parent
commit
cb6c943dc2

+ 6 - 0
lib/Target/AMDGPU/SIInsertWaits.cpp

@@ -601,6 +601,12 @@ bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
         insertDPPWaitStates(I);
       }
 
+      // Insert required wait states for SMRD reading an SGPR written by a VALU
+      // instruction.
+      if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS &&
+          I->getOpcode() == AMDGPU::V_READFIRSTLANE_B32)
+        TII->insertWaitStates(MBB, std::next(I), 4);
+
       // Wait for everything before a barrier.
       if (I->getOpcode() == AMDGPU::S_BARRIER)
         Changes |= insertWait(MBB, I, LastIssued);

+ 1 - 0
test/CodeGen/AMDGPU/missing-store.ll

@@ -10,6 +10,7 @@
 ; SI: buffer_store_dword
 ; SI: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
 ; SI: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
+; SI-NEXT: s_nop
 ; SI: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}
 ; SI: buffer_store_dword
 ; SI: s_endpgm

+ 1 - 0
test/CodeGen/AMDGPU/salu-to-valu.ll

@@ -56,6 +56,7 @@ done:                                             ; preds = %loop
 ; SI: s_movk_i32 [[OFFSET:s[0-9]+]], 0x2ee0
 ; GCN: v_readfirstlane_b32 s[[PTR_LO:[0-9]+]], v{{[0-9]+}}
 ; GCN: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}
+; SI-NEXT: s_nop
 ; SI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, [[OFFSET]]
 ; CI: s_load_dword [[OUT:s[0-9]+]], s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0xbb8
 ; GCN: v_mov_b32_e32 [[V_OUT:v[0-9]+]], [[OUT]]