Project Description

TODO

Installation

Prerequisites:

Python3
Apache spark 3.2 (https://spark.apache.org/downloads.html)
Cassandra DB (https://cassandra.apache.org/_/index.html, locally the docker build is recommended: https://hub.docker.com/_/cassandra)

For the graph implementation specifically you need to install graphframes manually from a third party since the official release is incompatible with spark 3.x (pull request pending). A prebuilt copy is supplied in the spark-packages directory.

graphframes (https://github.com/eejbyfeldt/graphframes/tree/spark-3.3)

Setting up

Modify settings.json to reflect your setup. If you are running everything locally you can use start_services.sh to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.
Load the development database by running python3 setup.py from the project root. Per default this will move small_test_data.csv into the transactions table.

Deploying:

Start the spark workload by either running submit.sh (slow) or submit_graph.sh (faster)
If you need to clean out the Database you can run python3 clean.py. Be wary that this wipes all table definitions and data.

README.md 1.3KB История Директен файл

Project Description

Installation

Prerequisites:

Setting up

Deploying:

README.md 1.3KB

История Директен файл