SociaLite: A Declarative Query Language for Large-Scale Graph Analysis
Jiwon Seo is an assistant professor of computer science in UNIST. Before joining UNIST in 2016, he worked at Pinterest where he applied his research prototype system to analyze Pinterest data. He earned his MS and Ph.D. from Stanford, where he worked on a high-level query language for large-scale graph analysis with his advisor Professor Monica Lam. Also, he has been a contributor to Python programming language since 2005, implementing various features of Python and Jython.
With the rise of social networks, large-scale graph analysis becomes increasingly important.However, existing systems for large-scale graph analysis provide low-level programming model, which requires non-trivial programming effort for even very simple analysis.
For greater ease of use and efficiency, we propose SociaLite, a high-level graph query language based on Datalog. With SociaLite, users succinctly implement graph analysis algorithms in a few lines of high-level queries, which are compiled to optimized parallel code that runs on a cluster of distributed machines. To store data in distributed machine, SociaLite programmers simply annotate how data is to be distributed; then the necessary communication is automatically inferred to generate parallel code. SociaLite compiler optimizes the evaluation of recursive aggregate functions using a delta stepping technique, a generalization of parallel shortest-paths algorithm. In addition, approximate computation is supported in SociaLite, allowing programmers to trade off accuracy for less time and space.
We evaluated SociaLite with six core graph algorithms used in many network analyses. Our experiment with 64 Amazon EC2 8-core instances shows that SociaLite programs perform within a factor of two with respect to ideal weak scaling.
Compared to optimized Giraph, an open-source implementation of Pregel, SociaLite programs are 4 to 12 times faster across benchmark algorithms, and 22 times more succinct on average. We also collaborate with Intel to compare SociaLite with four other graph processing frameworks; we found out that SociaLiteprovides simplest programming model with reasonably good performance.