# Project Description TODO # Installation ## Prerequisites: - Python3 - Apache spark 3.2 (https://spark.apache.org/downloads.html) - Cassandra DB (https://cassandra.apache.org/_/index.html, locally the docker build is recommended: https://hub.docker.com/_/cassandra) For the graph implementation specifically you need to install `graphframes` manually since the official release is incompatible with `spark 3.x` (pull request pending). A prebuilt copy is supplied in the `spark-packages` directory. - graphframes (https://github.com/eejbyfeldt/graphframes/tree/spark-3.3) ## Setting up - Modify `settings.json` to reflect your setup. If you are running everything locally you can use `start_services.sh` to turn everything on in one swoop. - Load the development database by running `python3 setup.py` from the project root. - Start the spark workload by either running `submit.sh` (slow) or `submit_graph.sh` (faster)