nitowa 562a281ce4 rename variables in graph		3年前
config/db	add db write to graph impl	3年前
spark-packages	working graph implementation and improved shell scripts	3年前
src/spark	rename variables in graph	3年前
.gitignore	add db write to graph impl	3年前
README.md	add clarification to README	3年前
clean.py	progress on mapping data, finding clusters, probably inefficient	3年前
settings.json	working graph implementation and improved shell scripts	3年前
setup.py	progress on mapping data, finding clusters, probably inefficient	3年前
small_test_data.csv	progress on mapping data, finding clusters, probably inefficient	3年前
start_services.sh	working graph implementation and improved shell scripts	3年前
submit.sh	working graph implementation and improved shell scripts	3年前
submit_graph.sh	working graph implementation and improved shell scripts	3年前

Project Description

TODO

Installation

Prerequisites:

Python3
Apache spark 3.2 (https://spark.apache.org/downloads.html)
Cassandra DB (https://cassandra.apache.org/_/index.html, locally the docker build is recommended: https://hub.docker.com/_/cassandra)

For the graph implementation specifically you need to install graphframes manually from a third party since the official release is incompatible with spark 3.x (pull request pending). A prebuilt copy is supplied in the spark-packages directory.

graphframes (https://github.com/eejbyfeldt/graphframes/tree/spark-3.3)

Setting up

Modify settings.json to reflect your setup. If you are running everything locally you can use start_services.sh to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.
Load the development database by running python3 setup.py from the project root. Per default this will move small_test_data.csv into the transactions table.

Deploying:

Start the spark workload by either running submit.sh (slow) or submit_graph.sh (faster)
If you need to clean out the Database you can run python3 clean.py. Be wary that this wipes all table definitions and data.

README.md

Project Description

Installation

Prerequisites:

Setting up

Deploying: