Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.
nitowa 681b09e5e0 more merging algorithms, add bench script il y a 2 ans
config/db more merging algorithms, add bench script il y a 2 ans
spark-packages working graph implementation and improved shell scripts il y a 2 ans
src/spark more merging algorithms, add bench script il y a 2 ans
.gitignore more merging algorithms, add bench script il y a 2 ans
README.md add clarification to README il y a 2 ans
bench.py more merging algorithms, add bench script il y a 2 ans
clean.py progress on mapping data, finding clusters, probably inefficient il y a 2 ans
settings.json checkpoint dir to settings, rename main_back to main_with_collect il y a 2 ans
setup.py more merging algorithms, add bench script il y a 2 ans
small_test_data.csv progress on mapping data, finding clusters, probably inefficient il y a 2 ans
start_services.sh working graph implementation and improved shell scripts il y a 2 ans
submit.sh working graph implementation and improved shell scripts il y a 2 ans
submit_graph.sh working graph implementation and improved shell scripts il y a 2 ans
submit_partition.sh union find with partition clustering il y a 2 ans

README.md

Project Description

TODO

Installation

Prerequisites:

For the graph implementation specifically you need to install graphframes manually from a third party since the official release is incompatible with spark 3.x (pull request pending). A prebuilt copy is supplied in the spark-packages directory.

Setting up

  • Modify settings.json to reflect your setup. If you are running everything locally you can use start_services.sh to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.
  • Load the development database by running python3 setup.py from the project root. Per default this will move small_test_data.csv into the transactions table.

Deploying:

  • Start the spark workload by either running submit.sh (slow) or submit_graph.sh (faster)
  • If you need to clean out the Database you can run python3 clean.py. Be wary that this wipes all table definitions and data.