Skip to content

Conversation

@yabola
Copy link
Contributor

@yabola yabola commented Sep 1, 2025

Why are the changes needed?

The following SQL statement will get the wrong column lineage result:
create table table0(a int, b string, c string)
create table table1(a int, b string, c string)
select (select sum(a) from table0 where table1.b = table0.b) as aa, b from table1

The root cause:
From apache/spark#32687 , we can know the references for a subquery expression are defined as outer attribute references. So we should always drill down to get the corresponding column lineage relationship for the subquery plan.

How was this patch tested?

add new ut

Was this patch authored or co-authored using generative AI tooling?

no

…wn to get the column lineage relationships
@yabola yabola changed the title [KYUUBI #7180][LINEAGE] Subquery in the project should drill down to get the column lineage relationships [KYUUBI #7180][LINEAGE] Subquery in the project should always drill down to get the lineage relationships Sep 1, 2025
@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2025

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (f7e10e6) to head (4afb6a6).
⚠️ Report is 70 commits behind head on master.

Files with missing lines Patch % Lines
...in/lineage/helper/SparkSQLLineageParseHelper.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #7181     +/-   ##
========================================
  Coverage    0.00%   0.00%             
========================================
  Files         701     698      -3     
  Lines       43565   45029   +1464     
  Branches     5911    6250    +339     
========================================
- Misses      43565   45029   +1464     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yabola
Copy link
Contributor Author

yabola commented Sep 4, 2025

@wForget @pan3793 please help review when you have time, thank you~

@github-actions
Copy link

github-actions bot commented Jan 1, 2026

Thanks for the PR! This PR is being closed due to inactivity. This isn't a judgement on the merit of the PR in any way. If this is still an issue with the latest version of Kyuubi, please reopen it and ask a committer to remove the Stale tag!

Thank you for using Kyuubi!

@github-actions github-actions bot added the Stale label Jan 1, 2026
@github-actions github-actions bot closed this Jan 1, 2026
@yabola
Copy link
Contributor Author

yabola commented Jan 7, 2026

@pan3793 Hi~ could you help review this when you have time? We had tested much other sql and didn't find more problems.

@pan3793 pan3793 reopened this Jan 7, 2026
@pan3793 pan3793 removed the Stale label Jan 7, 2026
@pan3793
Copy link
Member

pan3793 commented Jan 7, 2026

@yabola can you rebase it since master moves forward a lot.

val references =
if (exp.references.nonEmpty) exp.references
else {
if (exp.references.isEmpty || exp.child.isInstanceOf[ScalarSubquery]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to simply check if exp.child is a ScalarSubquery? Do we need to recursively check the children of exp?

For example, could you verify this sql?

select if((select sum(a) from table0 where table1.b = table0.b) > 100, 1, 2) as aa, b from table1

.foldLeft(ListMap[Attribute, AttributeSet]())(mergeColumnsLineage).values
.foldLeft(AttributeSet.empty)(_ ++ _)
.map(attr => attr.withQualifier(attr.qualifier :+ SUBQUERY_COLUMN_IDENTIFIER))
AttributeSet(attrRefs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, non-subquery references of exp may be ignored here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants