-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[acc] add RegionBranchOpInterface to acc.loop #172940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-mlir-openacc @llvm/pr-subscribers-mlir Author: Susan Tan (ス-ザン タン) (SusanTan) ChangesAdd RegionBranchOpInterface to acc.loop so that dataflow analysis can propagate properly. Full diff: https://github.com/llvm/llvm-project/pull/172940.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td b/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
index 349dc8bb858b5..14902730ba165 100644
--- a/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
+++ b/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
@@ -2576,6 +2576,7 @@ def OpenACC_LoopOp
RecursiveMemoryEffects,
DeclareOpInterfaceMethods<ComputeRegionOpInterface>,
DeclareOpInterfaceMethods<LoopLikeOpInterface>,
+ DeclareOpInterfaceMethods<RegionBranchOpInterface>,
MemoryEffects<[MemWrite<OpenACC_ConstructResource>]>]> {
let summary = "loop construct";
diff --git a/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp b/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
index 3dea621003a75..3109efbdb2f66 100644
--- a/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
+++ b/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
@@ -391,7 +391,7 @@ void OpenACCDialect::initialize() {
//===----------------------------------------------------------------------===//
// RegionBranchOpInterface for acc.kernels / acc.parallel / acc.serial /
-// acc.kernel_environment / acc.data / acc.host_data
+// acc.kernel_environment / acc.data / acc.host_data / acc.loop
//===----------------------------------------------------------------------===//
/// Generic helper for single-region OpenACC ops that execute their body once
@@ -444,6 +444,12 @@ void HostDataOp::getSuccessorRegions(
regions);
}
+void LoopOp::getSuccessorRegions(RegionBranchPoint point,
+ SmallVectorImpl<RegionSuccessor> ®ions) {
+ getSingleRegionOpSuccessorRegions(getOperation(), getRegion(), point,
+ regions);
+}
+
//===----------------------------------------------------------------------===//
// device_type support helpers
//===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir b/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir
index a97fdb7d78ad1..b2d4688ab5e9c 100644
--- a/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir
+++ b/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir
@@ -142,4 +142,51 @@ func.func @last_mod_openacc_host_data(%arg0: memref<f32>, %mapped: memref<f32>)
return %arg0 : memref<f32>
}
+// -----
+
+// CHECK-LABEL: test_tag: acc_loop_before:
+// CHECK: operand #0
+// CHECK-NEXT: - pre
+// CHECK-LABEL: test_tag: acc_loop_inside:
+// CHECK: operand #0
+// CHECK-NEXT: - loop_region
+// CHECK-LABEL: test_tag: acc_loop_after:
+// CHECK: operand #0
+// CHECK-NEXT: - loop_region
+// CHECK-LABEL: test_tag: acc_loop_post:
+// CHECK: operand #0
+// CHECK-NEXT: - post_loop
+// CHECK-LABEL: test_tag: acc_loop_return:
+// CHECK: operand #0
+// CHECK-NEXT: - post_loop
+func.func @last_mod_openacc_loop(%arg0: memref<f32>) -> memref<f32> {
+ %zero = arith.constant 0.0 : f32
+ // Store before the loop.
+ memref.store %zero, %arg0[] {tag_name = "pre"} : memref<f32>
+ memref.load %arg0[] {tag = "acc_loop_before"} : memref<f32>
+
+ %one = arith.constant 1.0 : f32
+ %c1_i32 = arith.constant 1 : i32
+ %c10_i32 = arith.constant 10 : i32
+
+ // Single-region loop: store and load inside.
+ acc.loop control(%iv : i32) = (%c1_i32 : i32) to (%c10_i32 : i32)
+ step (%c1_i32 : i32) {
+ // Store inside the loop.
+ memref.store %one, %arg0[] {tag_name = "loop_region"} : memref<f32>
+ memref.load %arg0[] {tag = "acc_loop_inside"} : memref<f32>
+ acc.yield
+ } attributes {auto_ = [#acc.device_type<none>],
+ inclusiveUpperbound = array<i1: true>}
+
+ // After the loop, the last writer is still the store in the loop.
+ memref.load %arg0[] {tag = "acc_loop_after"} : memref<f32>
+
+ // Store after the loop.
+ memref.store %zero, %arg0[] {tag_name = "post_loop"} : memref<f32>
+ memref.load %arg0[] {tag = "acc_loop_post"} : memref<f32>
+
+ // At return, the last writer should be the post-loop store.
+ return {tag = "acc_loop_return"} %arg0 : memref<f32>
+}
|
|
@llvm/pr-subscribers-openacc Author: Susan Tan (ス-ザン タン) (SusanTan) ChangesAdd RegionBranchOpInterface to acc.loop so that dataflow analysis can propagate properly. Full diff: https://github.com/llvm/llvm-project/pull/172940.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td b/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
index 349dc8bb858b5..14902730ba165 100644
--- a/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
+++ b/mlir/include/mlir/Dialect/OpenACC/OpenACCOps.td
@@ -2576,6 +2576,7 @@ def OpenACC_LoopOp
RecursiveMemoryEffects,
DeclareOpInterfaceMethods<ComputeRegionOpInterface>,
DeclareOpInterfaceMethods<LoopLikeOpInterface>,
+ DeclareOpInterfaceMethods<RegionBranchOpInterface>,
MemoryEffects<[MemWrite<OpenACC_ConstructResource>]>]> {
let summary = "loop construct";
diff --git a/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp b/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
index 3dea621003a75..3109efbdb2f66 100644
--- a/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
+++ b/mlir/lib/Dialect/OpenACC/IR/OpenACC.cpp
@@ -391,7 +391,7 @@ void OpenACCDialect::initialize() {
//===----------------------------------------------------------------------===//
// RegionBranchOpInterface for acc.kernels / acc.parallel / acc.serial /
-// acc.kernel_environment / acc.data / acc.host_data
+// acc.kernel_environment / acc.data / acc.host_data / acc.loop
//===----------------------------------------------------------------------===//
/// Generic helper for single-region OpenACC ops that execute their body once
@@ -444,6 +444,12 @@ void HostDataOp::getSuccessorRegions(
regions);
}
+void LoopOp::getSuccessorRegions(RegionBranchPoint point,
+ SmallVectorImpl<RegionSuccessor> ®ions) {
+ getSingleRegionOpSuccessorRegions(getOperation(), getRegion(), point,
+ regions);
+}
+
//===----------------------------------------------------------------------===//
// device_type support helpers
//===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir b/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir
index a97fdb7d78ad1..b2d4688ab5e9c 100644
--- a/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir
+++ b/mlir/test/Dialect/OpenACC/region-branchop-interface.mlir
@@ -142,4 +142,51 @@ func.func @last_mod_openacc_host_data(%arg0: memref<f32>, %mapped: memref<f32>)
return %arg0 : memref<f32>
}
+// -----
+
+// CHECK-LABEL: test_tag: acc_loop_before:
+// CHECK: operand #0
+// CHECK-NEXT: - pre
+// CHECK-LABEL: test_tag: acc_loop_inside:
+// CHECK: operand #0
+// CHECK-NEXT: - loop_region
+// CHECK-LABEL: test_tag: acc_loop_after:
+// CHECK: operand #0
+// CHECK-NEXT: - loop_region
+// CHECK-LABEL: test_tag: acc_loop_post:
+// CHECK: operand #0
+// CHECK-NEXT: - post_loop
+// CHECK-LABEL: test_tag: acc_loop_return:
+// CHECK: operand #0
+// CHECK-NEXT: - post_loop
+func.func @last_mod_openacc_loop(%arg0: memref<f32>) -> memref<f32> {
+ %zero = arith.constant 0.0 : f32
+ // Store before the loop.
+ memref.store %zero, %arg0[] {tag_name = "pre"} : memref<f32>
+ memref.load %arg0[] {tag = "acc_loop_before"} : memref<f32>
+
+ %one = arith.constant 1.0 : f32
+ %c1_i32 = arith.constant 1 : i32
+ %c10_i32 = arith.constant 10 : i32
+
+ // Single-region loop: store and load inside.
+ acc.loop control(%iv : i32) = (%c1_i32 : i32) to (%c10_i32 : i32)
+ step (%c1_i32 : i32) {
+ // Store inside the loop.
+ memref.store %one, %arg0[] {tag_name = "loop_region"} : memref<f32>
+ memref.load %arg0[] {tag = "acc_loop_inside"} : memref<f32>
+ acc.yield
+ } attributes {auto_ = [#acc.device_type<none>],
+ inclusiveUpperbound = array<i1: true>}
+
+ // After the loop, the last writer is still the store in the loop.
+ memref.load %arg0[] {tag = "acc_loop_after"} : memref<f32>
+
+ // Store after the loop.
+ memref.store %zero, %arg0[] {tag_name = "post_loop"} : memref<f32>
+ memref.load %arg0[] {tag = "acc_loop_post"} : memref<f32>
+
+ // At return, the last writer should be the post-loop store.
+ return {tag = "acc_loop_return"} %arg0 : memref<f32>
+}
|
razvanlupusoru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Susan! Would you mind figuring out if there is any missing functionality since it seems something is different compared to scf.for?
|
|
||
| void LoopOp::getSuccessorRegions(RegionBranchPoint point, | ||
| SmallVectorImpl<RegionSuccessor> ®ions) { | ||
| getSingleRegionOpSuccessorRegions(getOperation(), getRegion(), point, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is correct since it doesn't capture the backward branch to self.
For example, scf.for does the following:
void ForOp::getSuccessorRegions(RegionBranchPoint point,
SmallVectorImpl<RegionSuccessor> ®ions) {
// Both the operation itself and the region may be branching into the body or
// back into the operation itself. It is possible for loop not to enter the
// body.
regions.push_back(RegionSuccessor(&getRegion(), getRegionIterArgs()));
regions.push_back(RegionSuccessor(getOperation(), getResults()));
}
Can you please investigate what is needed to make it correct? And then see what the test needs to exercise this (namely, without this functionality, the test should fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right, i'm not implementing it as a loop currently. changing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to post here what we've discussed offline: acc.loop can function as both structured (like scf.for) and unstructured. When it's structured, the region should have a back edge to itself. When it's unstructured, it should model as if the region only execute once, and leave the back edges to be modeled within the loop using cf.cond_br
| memref.load %arg0[] {tag = "acc_loop_inside"} : memref<f32> | ||
| acc.yield | ||
| } attributes {auto_ = [#acc.device_type<none>], | ||
| inclusiveUpperbound = array<i1: true>} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inclusideUpperbound is not needed for this test.
razvanlupusoru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
vzakhari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Susan!
Add RegionBranchOpInterface to acc.loop so that dataflow analysis can propagate properly.