Runtime Parallelization of Static and Dynamic Irregular Array of Array References


  • Parwat Singh Anjanaa
  • N. Naga Maruthia
  • Sagar Gujjunooria
  • Madhu Orugantib





Irregular References, Runtime-Check Parallelization, Inspector-Executor Parallelization, Speculative Parallelization, Computation, Operation, Footprint


The advancement of computer systems such as multi-core and multiprocessor systems resulted in much faster computing than earlier. However, the efficient utilization of these rich computing resources is still an emerging area. For efficient utilization of computing resources, many optimization techniques have been developed, some techniques at compile time and some at runtime. When all the information required for parallel execution is known at compile time, then optimization compilers can reasonably parallelize a sequential program. However, optimization compiler fails when it encounters compile time unknowns in the program. A conventional solution for such problem can be performing parallelization at runtime. In this article, we propose three different solutions to parallelize a loop having an irregularity in the array of array references, with and without dependencies. More specifically, we introduce runtime check based parallelization technique for the static irregular references without dependency, Inspector-Executor based parallelization technique for static irregular references with dependencies, and finally Speculative parallelization technique (BitTLS) for dynamic irregular references with dependencies. For pro ling the runtime information, shared and private data structures are used. To detect the dependencies between footprints and for synchronization of threads at runtime, we use bit level operations. A window based scheduling policy is employed to schedule the iterations to the threads.





Aho AV, Sethi R, Ullman JD (1986) Compilers, Principles, Techniques. Addison wesley Boston

[2] Anderson TE (1990) The performance of spin lock alternatives for shared money multiprocessors. IEEE Transactions on Parallel and Distributed Systems 1(1):6-16

[3] ARM-Cortex (2011) a9 processor. URL: http://wwwarmcom/products/processors/cortex-a/cortex-a9php[Accessed 15 May 2017]

[4] Contributors W (2017) Memory footprint., online; accessed 19-August-2017

[5]Dagum L, Menon R (1998) Openmp: an industry standard apifor shared-memory programming. IEEE computational science and engineering 5(1):46-55

[6] Dang F, Yu H, Rauchwerger L (2001) Ther-lrpd test: Speculative parallelization of partially parallel loops. In: Parallel and Distributed Processing Symposium, Proceedings International, IPDPS 2002, Abstracts and CD-ROM, IEEE, pp 10-pp

[7] Felicie AL (2012) Reference (Computer Science). Salve Regina University

[8] Harris TL, Fraser K, Pratt IA (2002) A practical multi-word compare-and-swap operation. In: International Symposium on Distributed Computing, Springer, pp 265-279

[9] Huang D, Steffan JG (2011) Programmer-assisted automatic parallelization. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, IBM Corp., pp 84-98

[10] M D Allen SS, Sohi GS (2010) Serialization sets: a dynamic dependence-based parallel execution model. In: PPoPP '09, New York, NY, USA, ACM,pp 85-96

[11] M Kulkarni BWGRKB K Pingali, Chew LP (2007) Optimistic parallelism requires abstractions. In: PLDI '07, pp 211-222

[12] M Mendez-Lojo DPXSMAHMKMB D Nguyen, Pingali K (2010) Structure-driven optimization for amorphous data-parallel programs. In: PPoPP '10, New York, NY, USA, ACM

[13] Minh CC, Chung J, Kozyrakis C, Olukotun K (2008) Stamp: Stanford transactional applications for multi-processing. In: IISWC, IEEE Computer Society, pp 35-46

[14] Mock M (2004) Why programmer-specified aliasing is a bad idea. In: Latin American Conference on Compiler Science

[15] Nicolau A (1989) Run-time disambiguation: coping with statically unpredictable dependencies. IEEE Transactions on Computers 38(5):663-678

[16] R L Bocchino VSADDSVASHRKJOPSHS Jr, Vakilian M (2009) A type and effect system for deterministic parallel java. In: OOPSLA'09, NewYork, NY, USA, 2009. ACM, pp 97-116

[17] Rauchwerger L (1998) Run-time parallelization: Its time has come. Parallel Computing 24(3):527-556

[18] Reinders J (2007) Intel Threading Building Blocks

[19] Rinard MC, Lam MS (2010) The design, implementation, and evaluation of jade. In: ACM Trans. Program. Lang. Syst., 20(3), pp 483-545

[20] Rus S, Pennings M, Rauchwerger L (2007) Sensitivity analysis for automatic parallelization on multi-cores. In: Proceedings of the 21st annual international conference on Supercomputing, ACM, pp 263{273

[21] S S Mukherjee MDHJRLAR S D Sharma, Saltz J (1995) Efficient support for irregular applications on distributed-memory machines. In: PPOPP '95, ACM, New York, NY, USA, pp 68-79

[22] W Reid WK, Craik A (2008) Reasoning about inherent parallelism in modern object-oriented languages. In: ACSC '08, Darlinghurst, Australia, Australian Computer Society, Inc., pp 27-36

[23] Yiapanis P, Rosas-Ham D,Brown G, Lujan M (2013) Optimizing software runtime systems for speculative parallelization. ACM Transactions on Architecture and Code Optimization (TACO) 9(4):39

View Full Article: