# Project Description TODO # Installation ## Prerequisites: - Python3 - Apache spark 3.2 (https://spark.apache.org/downloads.html) - Cassandra DB (https://cassandra.apache.org/\_/index.html, locally the docker build is recommended: https://hub.docker.com/\_/cassandra) For the graph implementation specifically you need to install `graphframes` manually from a third party since the official release is incompatible with `spark 3.x` ([pull request pending](https://github.com/graphframes/graphframes/pull/415)). A prebuilt copy is supplied in the `spark-packages` directory. - graphframes (https://github.com/eejbyfeldt/graphframes/tree/spark-3.3) ## Setting up - Modify `settings.json` to reflect your setup. If you are running everything locally you can use `start_services.sh` to turn everything on in one swoop. It might take a few minutes for Cassandra to become available. - Load the development database by running `python3 setup.py` from the project root. Per default this will move `small_test_data.csv` into the transactions table. # Deploying: - Start the spark workload by either running `submit.sh` (slow) or `submit_graph.sh` (faster) - If you need to clean out the Database you can run `python3 clean.py`. Be wary that this wipes all table definitions and data.