選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。
nitowa 9c1ac98ebf union find with partition clustering 2年前
config/db add db write to graph impl 2年前
spark-packages working graph implementation and improved shell scripts 2年前
src/spark union find with partition clustering 2年前
.gitignore add db write to graph impl 2年前
README.md add clarification to README 2年前
clean.py progress on mapping data, finding clusters, probably inefficient 2年前
settings.json checkpoint dir to settings, rename main_back to main_with_collect 2年前
setup.py progress on mapping data, finding clusters, probably inefficient 2年前
small_test_data.csv progress on mapping data, finding clusters, probably inefficient 2年前
start_services.sh working graph implementation and improved shell scripts 2年前
submit.sh working graph implementation and improved shell scripts 2年前
submit_graph.sh working graph implementation and improved shell scripts 2年前
submit_partition.sh union find with partition clustering 2年前

README.md

Project Description

TODO

Installation

Prerequisites:

For the graph implementation specifically you need to install graphframes manually from a third party since the official release is incompatible with spark 3.x (pull request pending). A prebuilt copy is supplied in the spark-packages directory.

Setting up

  • Modify settings.json to reflect your setup. If you are running everything locally you can use start_services.sh to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.
  • Load the development database by running python3 setup.py from the project root. Per default this will move small_test_data.csv into the transactions table.

Deploying:

  • Start the spark workload by either running submit.sh (slow) or submit_graph.sh (faster)
  • If you need to clean out the Database you can run python3 clean.py. Be wary that this wipes all table definitions and data.