Skip to content

Conversation

@jeremyrsmith
Copy link
Contributor

Builds on and supersedes #110.

Macro all the things!

A new macro for selectExpr[B](T => B). It's like map, but it only works if the provided function can be converted to a projection of some column expressions over the original dataset - so it avoids SerDe.

It won't work all the time, but (hopefully) gives you good compiler errors when it doesn't work.

Currently it drops down into DataFrame to create the projection plan, and unifies it back into TypedDataset (via TypedDataset.create over Dataset).

There are a couple of hairy pieces that could be cleaned up (i.e. it's not clear what to do about SQL functions and UDFs, since they don't operate on the actual type). But it's already at a point where it's pretty useful (see new tests for examples of usage).

Before/After:

```scala
case class Foo(a: String, b: Int)
case class Bar(foo: Foo, c: Double)

val ds = TypedDataset.create(spark.createDataset(Seq.empty[Bar]))

// After:
val a = ds.select(ds(_.foo.a))
val b = ds.select(ds(_.foo.b))
val c = ds.select(ds(_.c))

// Before:
val a = ??? // no equivalent
val b = ??? // no equivalent
val c = ds.select(ds('c))
```

I am proposing this change because:

1. It typechecks before any macros are invoked
2. It makes editors like IntelliJ happy (because of #1)
3. It seems more idiomatic to me than using `Symbol`s
4. It allows specifying columns of nested structures, which is
   impossible with the current syntax

The downside is that a macro is used. But, a macro is used (indirectly,
through Witness.apply) for the existing syntax anyway. Also, that syntax
uses implicit conversions, which the proposed syntax doesn't. The macro
introduced for the new syntax is reasonably uncomplicated.

I changed `apply` and left `col` intact, so that both syntaxes would be
available. If this change is distasteful (understandable due to the BC
break), it could be renamed to something other than `apply`. However, I
really think this syntax is strictly more powerful than what's currently
used for `apply`, and should be the default behavior.
* Added failure test case for unusable function
* Added scoverage tags for known-unreachable branches
Macro all the things!

A new macro for `selectExpr[B](T => B)`. It's like `map`, only it only
works if the provided function can be converted to a projection of some
column expressions over the original dataset - so it avoids SerDe.

It won't work all the time, but (hopefully) gives you good compiler
errors when it doesn't work.

Currently it drops down into `DataFrame` to create the projection plan,
and unifies it back into `TypedDataset` (via `TypedDataset.create` over
`Dataset`).

There are a couple of hairy pieces that could be cleaned up (i.e. it's
not clear what to do about SQL functions and UDFs, since they don't
operate on the actual type). But it's already at a point where it's
pretty useful.
@codecov-io
Copy link

codecov-io commented Feb 20, 2017

Codecov Report

Merging #112 into master will increase coverage by 1.83%.
The diff coverage is 96.56%.

@@            Coverage Diff             @@
##           master     #112      +/-   ##
==========================================
+ Coverage   90.71%   92.54%   +1.83%     
==========================================
  Files          25       26       +1     
  Lines         506      751     +245     
  Branches        8       18      +10     
==========================================
+ Hits          459      695     +236     
- Misses         47       56       +9
Impacted Files Coverage Δ
...ataset/src/main/scala/frameless/TypedDataset.scala 94.93% <ø> (-0.07%)
dataset/src/main/scala/frameless/TypedColumn.scala 92.85% <ø> (ø)
...ataset/src/main/scala/frameless/TypedEncoder.scala 97.77% <100%> (+1.22%)
...src/main/scala/frameless/macros/ColumnMacros.scala 94.54% <94.54%> (ø)
...set/src/main/scala/frameless/FramelessSyntax.scala 66.66% <ø> (+16.66%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 312f7ee...aa1e55b. Read the comment docs.

@jeremyrsmith
Copy link
Contributor Author

Codecov is such a stickler! I'll have to go through and annotate the macro branches that are not currently expected to be reached.


// spark has scalatest and scalactic as a runtime dependency
// which can mess things up if you use a different version in your project
val exclusions = Seq(ExclusionRule("org.scalatest"), ExclusionRule("org.scalactic"))
Copy link
Contributor

@imarios imarios Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fixs the IntelliJ issue with ScalaTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

}

test("selectExpr with a binary operation between a column and a literal") {
val ds = TypedDataset.create(Seq(X2(10, 20), X2(20, 30)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks so cooool!

@kanterov
Copy link
Contributor

Looks awesome.

Regarding function support. Yes, we can make UDFs out of them, but then it wouldn't be super efficient because many of them have a native implementation in Spark. However, what we can do is also to support specialization - macro will replace functions native to Spark with their analogs that we define in frameless.

@kanterov
Copy link
Contributor

kanterov commented Feb 20, 2017

sealed trait BinaryFunc[A, R] {
  def col[T](a: TypedColumn[T, A]): TypedColumn[T, R]
}

object dsl {
  implicit def f1[A, B](s: BinaryFunc[A, B]): A => B = macro errorCompileTimeOnly
}

object col_dsl {
  implicit def f1[Z, A, B](s: BinaryFunc[A, B]): TypedColumn[Z, A] => TypedColumn[Z, B] = s.eval(_)
}

* Added Array TypedEncoder derivation
* Primitive optimization on Vector TypedEncoder
* Added unary function rewrites for `.length` on Array and Vector
"array",
ScalaReflection.dataTypeFor[Array[AnyRef]]
)
val arrayData = Option(underlying.sourceDataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoids boxing the primitives in the array

}
}

implicit def arrayEncoder[A](
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New encoder for Array, could be a separate PR but I needed it for this branch

case class LiteralExpr(tpe: Type, tree: Tree) extends Expr // Literal expression

// manually added tree (temporary until functions can be implemented; needed for operator substitution)
case class TrustMeBro(tpe: Type, tree: Tree) extends Expr
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of this once we figure out the best way to support function application

}

// A binary operator - the operator must exist on org.apache.spark.sql.Column
case Apply(sel @ Select(This(lhsE), op: TermName), List(This(rhsE))) => Some {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe typechecking the operation against a TypedColumn tree would be better here? But we would have to implement all the operators from Column in TypedColumn and I think a lot are missing


val companion = Option(result.companion).filterNot(_ == NoType).getOrElse {
try {
c.typecheck(Ident(result.typeSymbol.name.toTermName)).tpe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the case when the case class is nested inside a class or function - you can't directly get its companion. But I think that cases like that won't work anyway (breaks Spark's codegen), so maybe this won't be needed.



val datasetCol = c.typecheck(
q"new $ColumnName($selectorStr).as[$B]($TypedExpressionEncoder.apply[$B]($encoder))"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use ColumnName so that it doesn't refer to c.prefix - that means it's friendly to chaining like

ds.groupBy(_.a).agg(sum(ds(_.b)), count(ds(_.b)).selectExpr(x => (x._1, x._2 / x._3))

And the ds references in agg won't hurt anything because they are free of ds itself. The downside is you could use a column from a different dataset and it would compile (and even work if the other dataset has a same-named column).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyrsmith is it possible to get rid of the "ds()" requirement all together?

ds.groupBy(_.a).aggExpr(x => (sum(x.b)), count(x.b)) ).selectExpr(x => (x._1, x._2 / x._3))

It will make the syntax a bit more consistent. Maybe it's a lot of work? what do you think?

`selectExpr` now supports functions. The scheme I've come up with has a
mirror of each Spark column function, but with a type signature that
describes the actual arguments and result type of the function. These
mirrors (in frameless.functions.quoted) are annotated with a variant of
QuotedFunc, which specifies which Spark function they should rewrite to.
The macro picks up these annotations and allows the function call to be
rewritten to the native spark function once the column expressions are
refolded.

Roughly half of functions are mirrored at this point; need to finish
porting aggregate and misc functions.
@OlivierBlanvillain
Copy link
Contributor

I'm I correct saying that this allow to write

.selectExpr(x => x.a * x.b)

instead of

.select(x(_.a) * x(_.b))

(assuming #110.)?

If so I'm not convinced it is worth the additional complexity. Also I personally don't like that it looks so similar to .map(x => x.a * x.b) while going thought a completely different machinery...

@jeremyrsmith
Copy link
Contributor Author

jeremyrsmith commented Feb 22, 2017

@OlivierBlanvillain you're correct - it allows things like

ds.selectExpr {
  x => Foo(x.a * x.b, pow(x.c, 2) * x.d)
}

ds.selectExpr(x => (min(x.a), max(x.a)))

and so forth. It is very similar to map, except that it does not require a serialization and deserialization. It expands to Dataset column instructions instead.

The motivation is:

  1. Clean, idiomatic syntax
  2. Piggybacking on native typechecking (i.e. if you write a bad expression, your editor will mark it red right away and you'll get the familiar standard error about it - before any macros are even invoked)
  3. Because of (2), editor support will be vastly improved (there is no complicated implicit expansion to do in order to get proper syntax checking; vanilla typechecking does the job)
  4. Smaller bytecode generated, with fewer implicits (and in many cases dramatically faster compile times)

For example, to talk about complexity, currently this:

ds.selectExpr(x => x.a * x.b * x.c * x.e)

expands to this:

TypedDataset.create[Double](
  ds.dataset.toDF().select(
    new org.apache.spark.sql.ColumnName("a")
      .*(new org.apache.spark.sql.ColumnName("b"))
      .*(new org.apache.spark.sql.ColumnName("c"))
      .*(new org.apache.spark.sql.ColumnName("d"))
    ).as[Double](
      TypedExpressionEncoder.apply[Double](frameless.TypedEncoder.doubleEncoder)
    )
)(frameless.TypedEncoder.doubleEncoder)

while this:

ds.select(ds.col('a) * ds.col('b) * ds.col('c) * ds.col('d))

expands to this:

ds.select[Double](ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[shapeless.tag.@@[Symbol,String("a")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("a")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
  final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
    type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
    def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
  };
  new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
  final class anon$macro$22 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
    type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
    def to(p: frameless.TreeExpansionTest.Foo): anon$macro$22.this.Repr = (p match {
  case TreeExpansionTest.this.Foo((pat$macro$18 @ _), (pat$macro$19 @ _), (pat$macro$20 @ _), (pat$macro$21 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$18, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$19, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$20, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$21, shapeless.HNil))))
}).asInstanceOf[anon$macro$22.this.Repr];
    def from(p: anon$macro$22.this.Repr): frameless.TreeExpansionTest.Foo = p match {
      case shapeless.::((pat$macro$18 @ _), shapeless.::((pat$macro$19 @ _), shapeless.::((pat$macro$20 @ _), shapeless.::((pat$macro$21 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$18, pat$macro$19, pat$macro$20, pat$macro$21)
    }
  };
  ((new anon$macro$22()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(0).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("a")], Double]]), frameless.TypedEncoder.doubleEncoder).*(ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[shapeless.tag.@@[Symbol,String("b")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("b")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
  final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
    type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
    def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
  };
  new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
  final class anon$macro$27 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
    type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
    def to(p: frameless.TreeExpansionTest.Foo): anon$macro$27.this.Repr = (p match {
  case TreeExpansionTest.this.Foo((pat$macro$23 @ _), (pat$macro$24 @ _), (pat$macro$25 @ _), (pat$macro$26 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$23, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$24, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$25, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$26, shapeless.HNil))))
}).asInstanceOf[anon$macro$27.this.Repr];
    def from(p: anon$macro$27.this.Repr): frameless.TreeExpansionTest.Foo = p match {
      case shapeless.::((pat$macro$23 @ _), shapeless.::((pat$macro$24 @ _), shapeless.::((pat$macro$25 @ _), shapeless.::((pat$macro$26 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$23, pat$macro$24, pat$macro$25, pat$macro$26)
    }
  };
  ((new anon$macro$27()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(1).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("b")], Double]]), frameless.TypedEncoder.doubleEncoder))(frameless.CatalystNumeric.doubleNumeric).*(ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[shapeless.tag.@@[Symbol,String("c")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("c")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
  final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
    type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
    def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
  };
  new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
  final class anon$macro$37 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
    type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
    def to(p: frameless.TreeExpansionTest.Foo): anon$macro$37.this.Repr = (p match {
  case TreeExpansionTest.this.Foo((pat$macro$33 @ _), (pat$macro$34 @ _), (pat$macro$35 @ _), (pat$macro$36 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$33, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$34, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$35, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$36, shapeless.HNil))))
}).asInstanceOf[anon$macro$37.this.Repr];
    def from(p: anon$macro$37.this.Repr): frameless.TreeExpansionTest.Foo = p match {
      case shapeless.::((pat$macro$33 @ _), shapeless.::((pat$macro$34 @ _), shapeless.::((pat$macro$35 @ _), shapeless.::((pat$macro$36 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$33, pat$macro$34, pat$macro$35, pat$macro$36)
    }
  };
  ((new anon$macro$37()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(2).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("c")], Double]]), frameless.TypedEncoder.doubleEncoder))(frameless.CatalystNumeric.doubleNumeric).*(ds.col[Double](shapeless.Witness.mkWitness[shapeless.tag.@@[Symbol,String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[shapeless.tag.@@[Symbol,String("d")]]))(TypedColumn.this.Exists.deriveRecord[frameless.TreeExpansionTest.Foo, this.Out, Symbol with shapeless.tag.Tagged[String("d")], Double](shapeless.LabelledGeneric.materializeProduct[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]], this.Out]((({
  final class $anon extends _root_.shapeless.DefaultSymbolicLabelling[frameless.TreeExpansionTest.Foo] {
    type Out = shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]];
    def apply(): shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]] = shapeless.::.apply[shapeless.tag.@@[Symbol,String("a")], shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("b")], shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("c")], shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil.type]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]], shapeless.::.apply[shapeless.tag.@@[Symbol,String("d")], shapeless.HNil.type](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]], shapeless.HNil))))
  };
  new $anon()
}): shapeless.DefaultSymbolicLabelling.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[shapeless.tag.@@[Symbol,String("a")],shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]]]]), {
  final class anon$macro$47 extends _root_.shapeless.Generic[frameless.TreeExpansionTest.Foo] {
    type Repr = shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]];
    def to(p: frameless.TreeExpansionTest.Foo): anon$macro$47.this.Repr = (p match {
  case TreeExpansionTest.this.Foo((pat$macro$43 @ _), (pat$macro$44 @ _), (pat$macro$45 @ _), (pat$macro$46 @ _)) => shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]]](pat$macro$43, shapeless.::.apply[Double, shapeless.::[Double,shapeless.::[Double,shapeless.HNil.type]]](pat$macro$44, shapeless.::.apply[Double, shapeless.::[Double,shapeless.HNil.type]](pat$macro$45, shapeless.::.apply[Double, shapeless.HNil.type](pat$macro$46, shapeless.HNil))))
}).asInstanceOf[anon$macro$47.this.Repr];
    def from(p: anon$macro$47.this.Repr): frameless.TreeExpansionTest.Foo = p match {
      case shapeless.::((pat$macro$43 @ _), shapeless.::((pat$macro$44 @ _), shapeless.::((pat$macro$45 @ _), shapeless.::((pat$macro$46 @ _), shapeless.HNil)))) => TreeExpansionTest.this.Foo.apply(pat$macro$43, pat$macro$44, pat$macro$45, pat$macro$46)
    }
  };
  ((new anon$macro$47()): shapeless.Generic.Aux[frameless.TreeExpansionTest.Foo, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.::[Double, shapeless.HNil]]]]])
}, hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("a")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("b")],shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]]], shapeless.::[Double,shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("b")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("c")],shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil]], shapeless.::[Double,shapeless.::[Double,shapeless.HNil]]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("c")], Double, shapeless.::[shapeless.tag.@@[Symbol,String("d")],shapeless.HNil], shapeless.::[Double,shapeless.HNil]](hlist.this.ZipWithKeys.hconsZipWithKeys[shapeless.tag.@@[Symbol,String("d")], Double, shapeless.HNil, shapeless.HNil](hlist.this.ZipWithKeys.hnilZipWithKeys, shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("d")]](scala.Symbol.apply("d").asInstanceOf[shapeless.tag.@@[Symbol,String("d")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("d")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("c")]](scala.Symbol.apply("c").asInstanceOf[shapeless.tag.@@[Symbol,String("c")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("c")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("b")]](scala.Symbol.apply("b").asInstanceOf[shapeless.tag.@@[Symbol,String("b")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("b")]])), shapeless.Witness.mkWitness[Symbol with shapeless.tag.Tagged[String("a")]](scala.Symbol.apply("a").asInstanceOf[shapeless.tag.@@[Symbol,String("a")]].asInstanceOf[Symbol with shapeless.tag.Tagged[String("a")]])), scala.Predef.$conforms[this.Out]), new shapeless.ops.record.UnsafeSelector(3).asInstanceOf[shapeless.ops.record.Selector.Aux[shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("a")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("b")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("c")],Double],shapeless.::[Double with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("d")],Double],shapeless.HNil]]]], Symbol with shapeless.tag.Tagged[String("d")], Double]]), frameless.TypedEncoder.doubleEncoder))(frameless.CatalystNumeric.doubleNumeric))(frameless.TypedEncoder.doubleEncoder)

Which would you rather have in your class files? :)

edit: the printout above scrolls horizontally for about 20 miles as well, so make sure you check out the whole thing...

@OlivierBlanvillain
Copy link
Contributor

OlivierBlanvillain commented Feb 22, 2017

Sure, the byte code is ugly, but this huge thing will get executed only once to generate the same query plan, and results in the same execution by Spark. So performance is not a sufficient reason to hate on this byte code.

Now with this change, the user's code is 👍, but frameless code is 👎, so IMO the real question is whether we are willing to make this trade.

Going back to your points, do you actually have evidences for 3 and 4? It seams like after #110 the IDE support will be just fine, even with all our implicits (presentation compiler should be able to resolve those). Same for compilation speed, are the super simple implicits involved in *, min and max really faster than recursive macros?

@jeremyrsmith
Copy link
Contributor Author

To be fair, most of those implicits are for Exists and will be solved by #110. You could also make CatalystNumeric and CatalystCast be @elidable and they would also go away (since it's just a typechecking mechanism and isn't needed at runtime).

I still think this syntax is an improvement. In my next commit I'll have ported every single function and aggregate function from org.apache.spark.sql.functions with a type signature that actually represents the function. This would make frameless way more useful right now.

I get that you don't like the macro code, and I agree that for many purposes it's better to prefer typeclasses to macros. But I think a clean, idiomatic syntax option will be better for users.

🤷‍♂️

@imarios
Copy link
Contributor

imarios commented Feb 23, 2017

@OlivierBlanvillain let me add this to the discussion: The expression that @jeremyrsmith suggests (which we discussed together in another PR) solves the problem where you don't have a reference to the typeddataset you want to refer to.

Let me give an example:

case class Foo(i: Int, b: Boolean)
case class Bar(b: Boolean)
val ds: TypedDataset[Foo] = ???
ds.project[Bar].select( ???ds(???) ) // we can no longer use ds here, since ds refers a Foo dataset not the Bar dataset. 

// So we would have to store the projection in a separate val
val x = ds.project[Bar]
x.select(x('b)) 
// do-able but verbose

// Now with the new syntax
ds.project[Bar].selectExpr(x => x.b) // awesomeness :) 

// Even shorter
ds.project[Bar].selectExpr(_.b)

@OlivierBlanvillain
Copy link
Contributor

OlivierBlanvillain commented Feb 23, 2017

Can't we achieve the same with #110 and

implicit class C[T](ds: Dataset[T]) {
  def selectExpr[Out](f: Dataset[T] => TypeColumn[T, Out]): Dataset[Out] =
    ds.select(f(ds))
}

?

@jeremyrsmith jeremyrsmith deleted the select-expression branch February 24, 2017 04:28
@Voltir
Copy link

Voltir commented Sep 21, 2017

Er, why was this deleted? I would much rather have a user friendly map like macro then.. not have that

@jeremyrsmith
Copy link
Contributor Author

@Voltir I think the objection is around depending both on shapeless as well as complex macro code. The macros here involve hundreds of lines of code, which present a maintenance burden. Also, it diverges the API into methods that use shapeless-derived evidence and methods that short-circuit similar things directly using macros, and the logic might not line up in some cases. That could be confusing to users.

@Voltir
Copy link

Voltir commented Sep 21, 2017

Hm, I think what I am really looking forward to is #110 - poor intellij is struggling right now, just with composing a join or two..

That being said - I love macro magic, especially when it leads to clean user interfaces and apis. Would there be value in having a framless-macros library that could opt in to "potentially confusing" behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants