-
-
Notifications
You must be signed in to change notification settings - Fork 138
Select expression #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select expression #112
Conversation
Before/After:
```scala
case class Foo(a: String, b: Int)
case class Bar(foo: Foo, c: Double)
val ds = TypedDataset.create(spark.createDataset(Seq.empty[Bar]))
// After:
val a = ds.select(ds(_.foo.a))
val b = ds.select(ds(_.foo.b))
val c = ds.select(ds(_.c))
// Before:
val a = ??? // no equivalent
val b = ??? // no equivalent
val c = ds.select(ds('c))
```
I am proposing this change because:
1. It typechecks before any macros are invoked
2. It makes editors like IntelliJ happy (because of #1)
3. It seems more idiomatic to me than using `Symbol`s
4. It allows specifying columns of nested structures, which is
impossible with the current syntax
The downside is that a macro is used. But, a macro is used (indirectly,
through Witness.apply) for the existing syntax anyway. Also, that syntax
uses implicit conversions, which the proposed syntax doesn't. The macro
introduced for the new syntax is reasonably uncomplicated.
I changed `apply` and left `col` intact, so that both syntaxes would be
available. If this change is distasteful (understandable due to the BC
break), it could be renamed to something other than `apply`. However, I
really think this syntax is strictly more powerful than what's currently
used for `apply`, and should be the default behavior.
Macro all the things! A new macro for `selectExpr[B](T => B)`. It's like `map`, only it only works if the provided function can be converted to a projection of some column expressions over the original dataset - so it avoids SerDe. It won't work all the time, but (hopefully) gives you good compiler errors when it doesn't work. Currently it drops down into `DataFrame` to create the projection plan, and unifies it back into `TypedDataset` (via `TypedDataset.create` over `Dataset`). There are a couple of hairy pieces that could be cleaned up (i.e. it's not clear what to do about SQL functions and UDFs, since they don't operate on the actual type). But it's already at a point where it's pretty useful.
Codecov Report
@@ Coverage Diff @@
## master #112 +/- ##
==========================================
+ Coverage 90.71% 92.54% +1.83%
==========================================
Files 25 26 +1
Lines 506 751 +245
Branches 8 18 +10
==========================================
+ Hits 459 695 +236
- Misses 47 56 +9
Continue to review full report at Codecov.
|
|
Codecov is such a stickler! I'll have to go through and annotate the macro branches that are not currently expected to be reached. |
|
|
||
| // spark has scalatest and scalactic as a runtime dependency | ||
| // which can mess things up if you use a different version in your project | ||
| val exclusions = Seq(ExclusionRule("org.scalatest"), ExclusionRule("org.scalactic")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this fixs the IntelliJ issue with ScalaTest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
| } | ||
|
|
||
| test("selectExpr with a binary operation between a column and a literal") { | ||
| val ds = TypedDataset.create(Seq(X2(10, 20), X2(20, 30))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so cooool!
|
Looks awesome. Regarding function support. Yes, we can make UDFs out of them, but then it wouldn't be super efficient because many of them have a native implementation in Spark. However, what we can do is also to support specialization - macro will replace functions native to Spark with their analogs that we define in frameless. |
sealed trait BinaryFunc[A, R] {
def col[T](a: TypedColumn[T, A]): TypedColumn[T, R]
}
object dsl {
implicit def f1[A, B](s: BinaryFunc[A, B]): A => B = macro errorCompileTimeOnly
}
object col_dsl {
implicit def f1[Z, A, B](s: BinaryFunc[A, B]): TypedColumn[Z, A] => TypedColumn[Z, B] = s.eval(_)
} |
* Added Array TypedEncoder derivation * Primitive optimization on Vector TypedEncoder * Added unary function rewrites for `.length` on Array and Vector
| "array", | ||
| ScalaReflection.dataTypeFor[Array[AnyRef]] | ||
| ) | ||
| val arrayData = Option(underlying.sourceDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoids boxing the primitives in the array
| } | ||
| } | ||
|
|
||
| implicit def arrayEncoder[A]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New encoder for Array, could be a separate PR but I needed it for this branch
| case class LiteralExpr(tpe: Type, tree: Tree) extends Expr // Literal expression | ||
|
|
||
| // manually added tree (temporary until functions can be implemented; needed for operator substitution) | ||
| case class TrustMeBro(tpe: Type, tree: Tree) extends Expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of this once we figure out the best way to support function application
| } | ||
|
|
||
| // A binary operator - the operator must exist on org.apache.spark.sql.Column | ||
| case Apply(sel @ Select(This(lhsE), op: TermName), List(This(rhsE))) => Some { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe typechecking the operation against a TypedColumn tree would be better here? But we would have to implement all the operators from Column in TypedColumn and I think a lot are missing
|
|
||
| val companion = Option(result.companion).filterNot(_ == NoType).getOrElse { | ||
| try { | ||
| c.typecheck(Ident(result.typeSymbol.name.toTermName)).tpe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the case when the case class is nested inside a class or function - you can't directly get its companion. But I think that cases like that won't work anyway (breaks Spark's codegen), so maybe this won't be needed.
|
|
||
|
|
||
| val datasetCol = c.typecheck( | ||
| q"new $ColumnName($selectorStr).as[$B]($TypedExpressionEncoder.apply[$B]($encoder))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to use ColumnName so that it doesn't refer to c.prefix - that means it's friendly to chaining like
ds.groupBy(_.a).agg(sum(ds(_.b)), count(ds(_.b)).selectExpr(x => (x._1, x._2 / x._3))And the ds references in agg won't hurt anything because they are free of ds itself. The downside is you could use a column from a different dataset and it would compile (and even work if the other dataset has a same-named column).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyrsmith is it possible to get rid of the "ds()" requirement all together?
ds.groupBy(_.a).aggExpr(x => (sum(x.b)), count(x.b)) ).selectExpr(x => (x._1, x._2 / x._3))It will make the syntax a bit more consistent. Maybe it's a lot of work? what do you think?
`selectExpr` now supports functions. The scheme I've come up with has a mirror of each Spark column function, but with a type signature that describes the actual arguments and result type of the function. These mirrors (in frameless.functions.quoted) are annotated with a variant of QuotedFunc, which specifies which Spark function they should rewrite to. The macro picks up these annotations and allows the function call to be rewritten to the native spark function once the column expressions are refolded. Roughly half of functions are mirrored at this point; need to finish porting aggregate and misc functions.
|
I'm I correct saying that this allow to write .selectExpr(x => x.a * x.b)instead of .select(x(_.a) * x(_.b))(assuming #110.)? If so I'm not convinced it is worth the additional complexity. Also I personally don't like that it looks so similar to |
|
@OlivierBlanvillain you're correct - it allows things like ds.selectExpr {
x => Foo(x.a * x.b, pow(x.c, 2) * x.d)
}
ds.selectExpr(x => (min(x.a), max(x.a)))and so forth. It is very similar to The motivation is:
For example, to talk about complexity, currently this: ds.selectExpr(x => x.a * x.b * x.c * x.e)expands to this: TypedDataset.create[Double](
ds.dataset.toDF().select(
new org.apache.spark.sql.ColumnName("a")
.*(new org.apache.spark.sql.ColumnName("b"))
.*(new org.apache.spark.sql.ColumnName("c"))
.*(new org.apache.spark.sql.ColumnName("d"))
).as[Double](
TypedExpressionEncoder.apply[Double](frameless.TypedEncoder.doubleEncoder)
)
)(frameless.TypedEncoder.doubleEncoder)while this: ds.select(ds.col('a) * ds.col('b) * ds.col('c) * ds.col('d))expands to this: ds.select[Double](ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[shapeless.tag.@@[Symbol,String("a")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("a")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
};
new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
final class anon$macro$22 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
def to(p: frameless.TreeExpansionTest.Foo): anon$macro$22.this.Repr = (p match {
case TreeExpansionTest.this.Foo((pat$macro$18 @ _), (pat$macro$19 @ _), (pat$macro$20 @ _), (pat$macro$21 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$18, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$19, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$20, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$21, shapeless.HNil))))
}).asInstanceOf[anon$macro$22.this.Repr];
def from(p: anon$macro$22.this.Repr): frameless.TreeExpansionTest.Foo = p match {
case shapeless.::((pat$macro$18 @ _), shapeless.::((pat$macro$19 @ _), shapeless.::((pat$macro$20 @ _), shapeless.::((pat$macro$21 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$18, pat$macro$19, pat$macro$20, pat$macro$21)
}
};
((new anon$macro$22()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(0).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("a")], Double]]), frameless.TypedEncoder.doubleEncoder).*(ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[shapeless.tag.@@[Symbol,String("b")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("b")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
};
new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
final class anon$macro$27 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
def to(p: frameless.TreeExpansionTest.Foo): anon$macro$27.this.Repr = (p match {
case TreeExpansionTest.this.Foo((pat$macro$23 @ _), (pat$macro$24 @ _), (pat$macro$25 @ _), (pat$macro$26 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$23, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$24, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$25, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$26, shapeless.HNil))))
}).asInstanceOf[anon$macro$27.this.Repr];
def from(p: anon$macro$27.this.Repr): frameless.TreeExpansionTest.Foo = p match {
case shapeless.::((pat$macro$23 @ _), shapeless.::((pat$macro$24 @ _), shapeless.::((pat$macro$25 @ _), shapeless.::((pat$macro$26 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$23, pat$macro$24, pat$macro$25, pat$macro$26)
}
};
((new anon$macro$27()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(1).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("b")], Double]]), frameless.TypedEncoder.doubleEncoder))(frameless.CatalystNumeric.doubleNumeric).*(ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[shapeless.tag.@@[Symbol,String("c")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("c")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
};
new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
final class anon$macro$37 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
def to(p: frameless.TreeExpansionTest.Foo): anon$macro$37.this.Repr = (p match {
case TreeExpansionTest.this.Foo((pat$macro$33 @ _), (pat$macro$34 @ _), (pat$macro$35 @ _), (pat$macro$36 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$33, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$34, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$35, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$36, shapeless.HNil))))
}).asInstanceOf[anon$macro$37.this.Repr];
def from(p: anon$macro$37.this.Repr): frameless.TreeExpansionTest.Foo = p match {
case shapeless.::((pat$macro$33 @ _), shapeless.::((pat$macro$34 @ _), shapeless.::((pat$macro$35 @ _), shapeless.::((pat$macro$36 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$33, pat$macro$34, pat$macro$35, pat$macro$36)
}
};
((new anon$macro$37()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(2).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("c")], Double]]), frameless.TypedEncoder.doubleEncoder))(frameless.CatalystNumeric.doubleNumeric).*(ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[shapeless.tag.@@[Symbol,String("d")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("d")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
};
new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
final class anon$macro$47 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
def to(p: frameless.TreeExpansionTest.Foo): anon$macro$47.this.Repr = (p match {
case TreeExpansionTest.this.Foo((pat$macro$43 @ _), (pat$macro$44 @ _), (pat$macro$45 @ _), (pat$macro$46 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$43, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$44, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$45, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$46, shapeless.HNil))))
}).asInstanceOf[anon$macro$47.this.Repr];
def from(p: anon$macro$47.this.Repr): frameless.TreeExpansionTest.Foo = p match {
case shapeless.::((pat$macro$43 @ _), shapeless.::((pat$macro$44 @ _), shapeless.::((pat$macro$45 @ _), shapeless.::((pat$macro$46 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$43, pat$macro$44, pat$macro$45, pat$macro$46)
}
};
((new anon$macro$47()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(3).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("d")], Double]]), frameless.TypedEncoder.doubleEncoder))(frameless.CatalystNumeric.doubleNumeric))(frameless.TypedEncoder.doubleEncoder)Which would you rather have in your class files? :) edit: the printout above scrolls horizontally for about 20 miles as well, so make sure you check out the whole thing... |
|
Sure, the byte code is ugly, but this huge thing will get executed only once to generate the same query plan, and results in the same execution by Spark. So performance is not a sufficient reason to hate on this byte code. Now with this change, the user's code is 👍, but frameless code is 👎, so IMO the real question is whether we are willing to make this trade. Going back to your points, do you actually have evidences for 3 and 4? It seams like after #110 the IDE support will be just fine, even with all our implicits (presentation compiler should be able to resolve those). Same for compilation speed, are the super simple implicits involved in |
|
To be fair, most of those implicits are for I still think this syntax is an improvement. In my next commit I'll have ported every single function and aggregate function from I get that you don't like the macro code, and I agree that for many purposes it's better to prefer typeclasses to macros. But I think a clean, idiomatic syntax option will be better for users. 🤷♂️ |
|
@OlivierBlanvillain let me add this to the discussion: The expression that @jeremyrsmith suggests (which we discussed together in another PR) solves the problem where you don't have a reference to the typeddataset you want to refer to. Let me give an example: case class Foo(i: Int, b: Boolean)
case class Bar(b: Boolean)
val ds: TypedDataset[Foo] = ???
ds.project[Bar].select( ???ds(???) ) // we can no longer use ds here, since ds refers a Foo dataset not the Bar dataset.
// So we would have to store the projection in a separate val
val x = ds.project[Bar]
x.select(x('b))
// do-able but verbose
// Now with the new syntax
ds.project[Bar].selectExpr(x => x.b) // awesomeness :)
// Even shorter
ds.project[Bar].selectExpr(_.b) |
|
Can't we achieve the same with #110 and implicit class C[T](ds: Dataset[T]) {
def selectExpr[Out](f: Dataset[T] => TypeColumn[T, Out]): Dataset[Out] =
ds.select(f(ds))
}? |
|
Er, why was this deleted? I would much rather have a user friendly map like macro then.. not have that |
|
@Voltir I think the objection is around depending both on shapeless as well as complex macro code. The macros here involve hundreds of lines of code, which present a maintenance burden. Also, it diverges the API into methods that use shapeless-derived evidence and methods that short-circuit similar things directly using macros, and the logic might not line up in some cases. That could be confusing to users. |
|
Hm, I think what I am really looking forward to is #110 - poor intellij is struggling right now, just with composing a join or two.. That being said - I love macro magic, especially when it leads to clean user interfaces and apis. Would there be value in having a framless-macros library that could opt in to "potentially confusing" behavior? |
Builds on and supersedes #110.
Macro all the things!
A new macro for
selectExpr[B](T => B). It's likemap, but it only works if the provided function can be converted to a projection of some column expressions over the original dataset - so it avoids SerDe.It won't work all the time, but (hopefully) gives you good compiler errors when it doesn't work.
Currently it drops down into
DataFrameto create the projection plan, and unifies it back intoTypedDataset(viaTypedDataset.createoverDataset).There are a couple of hairy pieces that could be cleaned up (i.e. it's not clear what to do about SQL functions and UDFs, since they don't operate on the actual type). But it's already at a point where it's pretty useful (see new tests for examples of usage).