Recommender system (KSU Big Data course - video 3, 1:56:45 total time)
If you want to run demos and exercises interactively, request 1 core for interactive use then load the modules and activate your Python virtual environment.
> srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash
> module purge
> module load Spark
> source ~/virtualenvs/python-3.7.4/bin/activate
Demos and exercises can be run on the node you’re on using spark-submit
> spark-submit recommender.py
You can also start pyspark and use it interactively:
> pyspark
>>>
Or the recommender.py script can be run using the job script sb.recommender
> sbatch sb.recommender
Start by creating a unique input file for your username.
/homes/dan/625/BigData/bigdata --inputfile recommender
Note: if working on Beoshock, replace /homes/dan/625/BigData/bigdata
with /home/c297w489/BigData/bigdata
This will give you the input file recommender_username.input (to use instead of ratings.csv) as well as a starting python code recommender_original.py that you will modify and turn back in. The movies.csv file is also copied over.
Note: do not reset the seed, this will change your answers and they will be considered incorrect!
When done, check your result:
/homes/dan/625/BigData/bigdata --check recommender_username.py recommender_username.results
Note: if working on Beoshock, replace /homes/dan/625/BigData/bigdata
with /home/c297w489/BigData/bigdata
You again can resubmit as often as you’d like if your answers are wrong.