Chapter 1 Welcome

Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users.

1.1 What will you find in this book

This short publication attempts to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining the ability to use R code organized in custom-built functions and packages.

This publication focuses on exploring the different interfaces available for communication between R and Spark using the sparklyr package, namely:

If you are interested in the sparklyr package and working with Spark from R in general, we strongly recommend the very comprehensive Mastering Spark with R book available online for free.

1.2 Book sources

This book is rendered and published automatically from publicly accessible git repositories, you can find the

All contributions to the above are most welcome.

1.3 Acknowledgments

The creation of this book would not be possible without many openly available resources such as the R packages around the rmarkdown ecosystem created by Yihui Xie, namely the bookdown package via which this publication is rendered. This project also heavily relies on the Rocker Project which provides Docker images for the R environment thanks to Carl Boettiger, Dirk Eddelbuettel, and Noam Ross. Last but not least there would be nothing to write about in this short book if the sparklyr package was not written by Javier Luraschi et al., the R programming language itself maintained by the R core group and the Apache Spark creators and maintainers. My thanks go to the creators and maintainers of all these amazing open-source tools.

Differences of habit and language are nothing at all if our aims are identical and our hearts are open

  • Albus Dumbledore