Chapter 3 Using a ready-made Docker Image
For the purpose of this book, a Docker image was built which you can use to run all the code chunks present in it without issues.
3.1 Installing Docker
The installation instructions for Docker are very accessible and should get you going fairly quickly. We provide links to them, please choose based on your platform:
3.2 Using the Docker image with R
The Docker image we prepared contains all the prerequisites needed to run all the code chunks present in this book as this very image is used to render the book itself. If you are interested in the details around the image, please feel free to visit the GitHub repository from where it is openly accessible.
Now we look at how we use the image in practice in a few ways.
3.2.1 Interactively with RStudio
Running the following line of code in a terminal should create a container and expose RStudio for use. If you are using RStudio 1.1 or newer, the Terminal functionality is built into RStudio itself.
# You can replace pass below with a password of your choice
docker run -d -p 8787:8787 -e PASSWORD=pass --name rstudio jozefhajnala/sparkfromr:latest
After running the above line, open your favorite web browser such as Google Chrome or Firefox and navigate to http://localhost:8787. You should be greeted by the RStudio login screen where you can use the following to log in:
- Username:
rstudio
- Password:
pass
(or the one you chose above)
Now you can freely start to use the code content of the book, starting by connecting to a local Spark instance.
3.2.2 Interactively with the R console
Running the following should yield an interactive R session with all prerequisites to start working with the sparklyr package using a local Spark instance.
Now you can freely start to use the code content of the book from the R console, starting by connecting to a local Spark instance.
3.2.3 Running an example R script
Running the following should execute an example R script using sparklyr with output appearing in the terminal.
3.3 Interactively with the Spark shell
Running the following should yield an interactive Scala REPL instance. A Spark context should be available as sc
and a Spark session as spark
.