Rather than expecting unlabeled observations to be included with the training data, it makes far more sense to create an object from which a predict call can be made. The predict function would be applied to an unlabeled data set and indicates which observations should be queried.
To facilitate this feature, explore the new functions:
tidyr::nest()
tidyr::unnest()
tidyr::map()
- etc.
I got this idea from listening to Hadley's talk at An Afternoon with Hadley Wickham and Friends. Slides?