-
Notifications
You must be signed in to change notification settings - Fork 974
[KYUUBI #7180][LINEAGE] Subquery in the project should always drill down to get the lineage relationships #7181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…wn to get the column lineage relationships
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7181 +/- ##
========================================
Coverage 0.00% 0.00%
========================================
Files 701 698 -3
Lines 43565 45029 +1464
Branches 5911 6250 +339
========================================
- Misses 43565 45029 +1464 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for the PR! This PR is being closed due to inactivity. This isn't a judgement on the merit of the PR in any way. If this is still an issue with the latest version of Kyuubi, please reopen it and ask a committer to remove the Stale tag! Thank you for using Kyuubi! |
|
@pan3793 Hi~ could you help review this when you have time? We had tested much other sql and didn't find more problems. |
|
@yabola can you rebase it since master moves forward a lot. |
| val references = | ||
| if (exp.references.nonEmpty) exp.references | ||
| else { | ||
| if (exp.references.isEmpty || exp.child.isInstanceOf[ScalarSubquery]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it enough to simply check if exp.child is a ScalarSubquery? Do we need to recursively check the children of exp?
For example, could you verify this sql?
select if((select sum(a) from table0 where table1.b = table0.b) > 100, 1, 2) as aa, b from table1
| .foldLeft(ListMap[Attribute, AttributeSet]())(mergeColumnsLineage).values | ||
| .foldLeft(AttributeSet.empty)(_ ++ _) | ||
| .map(attr => attr.withQualifier(attr.qualifier :+ SUBQUERY_COLUMN_IDENTIFIER)) | ||
| AttributeSet(attrRefs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, non-subquery references of exp may be ignored here.
Why are the changes needed?
The following SQL statement will get the wrong column lineage result:
create table table0(a int, b string, c string)
create table table1(a int, b string, c string)
select (select sum(a) from table0 where table1.b = table0.b) as aa, b from table1
The root cause:
From apache/spark#32687 , we can know the references for a subquery expression are defined as outer attribute references. So we should always drill down to get the corresponding column lineage relationship for the subquery plan.
How was this patch tested?
add new ut
Was this patch authored or co-authored using generative AI tooling?
no