# Project Description

TODO

# Installation

## Prerequisites:

- Python3
- Apache spark 3.2 (https://spark.apache.org/downloads.html)
- Cassandra DB (https://cassandra.apache.org/\_/index.html, locally the docker build is recommended: https://hub.docker.com/\_/cassandra)

For the graph implementation specifically you need to install `graphframes` manually from a third party since the official release is incompatible with `spark 3.x` ([pull request pending](https://github.com/graphframes/graphframes/pull/415)). A prebuilt copy is supplied in the `spark-packages` directory. 
- graphframes (https://github.com/eejbyfeldt/graphframes/tree/spark-3.3)

## Setting up

- Modify `settings.json` to reflect your setup. If you are running everything locally you can use `start_services.sh` to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.
- Load the development database by running `python3 setup.py` from the project root. Per default this will move `small_test_data.csv` into the transactions table.

# Deploying:

- Start the spark workload by either running `submit.sh` (slow) or `submit_graph.sh` (faster)
- If you need to clean out the Database you can run `python3 clean.py`. Be wary that this wipes all table definitions and data.