nitowa 183723e46f non graph solution but extremely memory intensive | 2 gadus atpakaļ | |
---|---|---|
config/db | 2 gadus atpakaļ | |
spark-packages | 2 gadus atpakaļ | |
src/spark | 2 gadus atpakaļ | |
.gitignore | 2 gadus atpakaļ | |
README.md | 2 gadus atpakaļ | |
clean.py | 2 gadus atpakaļ | |
settings.json | 2 gadus atpakaļ | |
setup.py | 2 gadus atpakaļ | |
small_test_data.csv | 2 gadus atpakaļ | |
start_services.sh | 2 gadus atpakaļ | |
submit.sh | 2 gadus atpakaļ | |
submit_graph.sh | 2 gadus atpakaļ |
TODO
For the graph implementation specifically you need to install graphframes
manually from a third party since the official release is incompatible with spark 3.x
(pull request pending). A prebuilt copy is supplied in the spark-packages
directory.
settings.json
to reflect your setup. If you are running everything locally you can use start_services.sh
to turn everything on in one swoop. It might take a few minutes for Cassandra to become available.python3 setup.py
from the project root. Per default this will move small_test_data.csv
into the transactions table.submit.sh
(slow) or submit_graph.sh
(faster)python3 clean.py
. Be wary that this wipes all table definitions and data.