In GSWITCH, We also implement the forward and backward phases. The backward phase computes the BC value using the equation above in a reversed BFS order. The forward phase computes the number of the shortest paths from the $root$ to each vertex in BFS order. The algorithm is composed of two phases: a forward phase and a backward phase. Using G = device_graph_t struct BFS : Functor \] Here is a sample codes of BFS for graph soc-orkut. Required, describe how the message is processed in the target vertex. Required, describe the message from one vertex to another. Required, stream all the vertices(or edges) and filter out active ones then update their value. you should provide at most six small functions. *Note: By using -configs, you can force the applications to run with the static strategies. (Default: undireĬted ) Īrater them (Default: ). (Defaule: no validation ) Input file has header (e.g. (Default: quiet mode ) Process the CPU reference validati DependencyĬlone GSWITCH code to local server and build GSWITCH with CMake. Offloading the decision-making to a fully auto-tuning runtime could be a better choice. Data analysts should not spend their labor on wrestling with the tedious and complex performance tuning. Unfortunately, the number of these performance-crucial strategies is very large, or worse yet, various combinations of these strategies form a huge tuning space. A bulk synchronous parallel (BSP)-style graph application achieves its best performance only if correct strategies are chosen in every super-step. Figure 3 shows the performance loss if we only use push model in frontier expansion.įigure 1: Best load-balance for different graphĬomplication: Priori knowledge is required for users to make favorable decisions, especially from a mass of choices. For example, Figure 2 shows that differents graph require different load-balance strategies. Previous works accelerated graph primitives to run truly fast on some particular graphs or algorithms, however, their performance might fell dramatically when facing an unmatched situation. Mismatch: Previous GPU-based graph frameworks may incur performance hits due to suboptimal strategies. This leads to the mismatch and complication issues: Although the primary optimizations of these works are diverse, we notice that most of them are trying to find a ‘one size fits all’ solution. Many recent works have explored the potential of using GPUs for data-intensive graph processing. Why GSWTICHĪs GPUs provide higher parallelism and memory bandwidth than traditional CPUs, GPUs become a promising hardware to accelerate graph algorithms. Developers can implements their graph applications with high performance in just ~100 lines of code. In addition, GSWITCH provides succinct programming interface which hides all low-level tuning details. The model can be resued by new applications, or be retrained to adapt to new architectures. The fast optimization transition of GSWITCH is based on a machine learning model trained from 600 real graphs from the network repository. Specifically, It is a CUDA library targeting the GPU-based graph applications, it supports both vertex-centric or edge-centric abstractions.īy far, GSWITCH can automatically determine the suitable optimization variants in Direction (push, pull), data-structure (Bitmap, Sorted Queue, Unsorted Queue), Load-Balance (TWC, WM, CM, STRICT, 2D-partition), Stepping (Increase, Decrease, Remain), and Kernel Fusion (Standalone, Fused). GSWITCH is a pattern-based algorithmic autotuning system that dynamically switched to the suitable optimization variants with negligible overhead. View the Project on GitHub PAA-NCIC/GSWITCH A pattern-based algorithmic autotuner for graph processing on GPUs.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |