Machine Learning: Recommender System for Spark

Video:

Recommender system (KSU Big Data course - video 3, 1:56:45 total time)

Exercise:

If you want to run demos and exercises interactively, request 1 core for interactive use then load the modules and activate your Python virtual environment.

> srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash

> module purge
> module load Spark

> source ~/virtualenvs/python-3.7.4/bin/activate

Demos and exercises can be run on the node you’re on using spark-submit

> spark-submit recommender.py

You can also start pyspark and use it interactively:

> pyspark
>>>

Or the recommender.py script can be run using the job script sb.recommender

> sbatch sb.recommender

Homework:
* Expand for Details If you are doing this workshop as part of a class, follow the instructions below on how to turn in the exercise and homework answers. If you are taking this on your own, answers are available at your request.

Start by creating a unique input file for your username.

/homes/dan/625/BigData/bigdata --inputfile recommender

Note: if working on Beoshock, replace /homes/dan/625/BigData/bigdata with /home/c297w489/BigData/bigdata

This will give you the input file recommender_username.input (to use instead of ratings.csv) as well as a starting python code recommender_original.py that you will modify and turn back in. The movies.csv file is also copied over.

Note: do not reset the seed, this will change your answers and they will be considered incorrect!

  • Exercise 1) Alter the code to ensure that all recommendations are between 1-5 by setting all values above 5 to 5 and all values below 1 to 1. Copy the top movie to line 1 of the recommender_username.results file. Do not use any special characters in this title.
  • Exercise 2) Add a loop to the end of your code to calculate the error vs the number of iterations for the best_k for iterations 2, 4, …, 20. Put 10 numbers in a single line in your results file, separated by spaces.
  • Exercise 3) optional - Is it more effective to spend compute cycles on iterations or rank?

When done, check your result:

/homes/dan/625/BigData/bigdata --check recommender_username.py recommender_username.results

Note: if working on Beoshock, replace /homes/dan/625/BigData/bigdata with /home/c297w489/BigData/bigdata

You again can resubmit as often as you’d like if your answers are wrong.

Slides:

The browser that you are using does not have a PDF plugin. You can view the PDF File offline instead