選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。
nitowa 681b09e5e0 more merging algorithms, add bench script 1年前
config/db more merging algorithms, add bench script 1年前
spark-packages working graph implementation and improved shell scripts 1年前
src/spark more merging algorithms, add bench script 1年前
.gitignore more merging algorithms, add bench script 1年前
README.md add clarification to README 1年前
bench.py more merging algorithms, add bench script 1年前
clean.py progress on mapping data, finding clusters, probably inefficient 1年前
settings.json checkpoint dir to settings, rename main_back to main_with_collect 1年前
setup.py more merging algorithms, add bench script 1年前
small_test_data.csv progress on mapping data, finding clusters, probably inefficient 1年前
start_services.sh working graph implementation and improved shell scripts 1年前
submit.sh working graph implementation and improved shell scripts 1年前
submit_graph.sh working graph implementation and improved shell scripts 1年前
submit_partition.sh union find with partition clustering 1年前

README.md

Project Description

TODO

Installation

Prerequisites:

For the graph implementation specifically you need to install graphframes manually from a third party since the official release is incompatible with spark 3.x (pull request pending). A prebuilt copy is supplied in the spark-packages directory.

Setting up

  • Modify settings.json to reflect your setup. If you are running everything locally you can use start_services.sh to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.
  • Load the development database by running python3 setup.py from the project root. Per default this will move small_test_data.csv into the transactions table.

Deploying:

  • Start the spark workload by either running submit.sh (slow) or submit_graph.sh (faster)
  • If you need to clean out the Database you can run python3 clean.py. Be wary that this wipes all table definitions and data.