DVFS Algorithms of GPU and Memory for Mobile GPGPU Applications: A case study

  • Authors

    • SeongKi Kim Keimyung University
    • Seok-Kyoo Kim Sangmyung University
    2018-08-24
    https://doi.org/10.14419/ijet.v7i3.15296
  • DVFS, Embedded system, GPU, GPGPU, Mobile system, OpenCL
  • Abstract

    Although both OpenCL and RenderScript have allowed the General-Purpose Graphics Processing Unit (GPGPU) to be used even for mobile GPUs, it is still difficult for mobile applications to use the GPGPU for several reasons. One of the reasons is that mobile devices place restrictions on GPU performance through power-saving technologies such as Dynamic Voltage and Frequency Scaling (DVFS). DVFS tries to control the balance between performance and energy consumption based on the application’s requirements. This technology has been successful in many cases and is widely used; however, it significantly decreases the performance of GPGPU applications. In this paper, we propose novel DVFS algorithms for GPU and memory when the GPGPU applications run. The suggested algorithms decreased the energy consumption by more than 0.7 times without any algorithm changes, and improved the energy efficiency (performance per watt) by more than 3.42 times in comparison with the conventional interval-based algorithm.

  • References

    1. [1] Sellers G, Kessenich JM. Vulkan Programming Guide: The Official Guide to Learning Vulkan. Always learning. Addison Wesley; 2016.

      [2] Nickolls J, Buck I, Garland M, Skadron K. Scalable Parallel Programming with CUDA. Queue. 2008;6(2):40–53.

      [3] Stone JE, Gohara D, Shi G. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Des Test. 2010;12(3):66–73.

      [4] Google Corporation. RenderScript; 2017. Available from: https://developer.android.com/guide/topics/renderscript/compute.html.

      [5] Fernando R. GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics. Pearson Higher Education; 2004.

      [6] Pharr M, Fernando R. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (GPU Gems). Addison-Wesley Professional; 2005.

      [7] Tang M, Manocha D, Lin J, Tong R. Collision-streams: Fast GPU based Collision Detection for Deformable Models. In: Symposium on Interactive 3D Graphics and Games. I3D ’11. New York, NY, USA: ACM; 2011. p. 63–70.

      [8] Yang J, Goodman J. Symmetric Key Cryptography on Modern Graphics Hardware. In: Proceedings of the Advances in Crypotology 13th International Conference on Theory and Application of Cryptology and Information Security. ASIACRYPT’07. Berlin, Heidelberg: Springer-Verlag; 2007. p. 249–264.

      [9] Khronos Group; 2017. Available from: https://www.khronos.org.

      [10] Howes L, editor. The OpenCL Specification Version: 2.1 Document Revision: 23; 2015.

      [11] ISO. ISO C Standard 1999; 1999.

      [12] Tang L, Zhang Y. Low-power Task Scheduling for GPU Energy Reduction; 2011.

      [13] Orgerie AC, Assuncao MDd, Lefevre L. A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Computing Surveys (CSUR). 2014;46(4):47.

      [14] Ge R, Vogt R, Majumder J, Alam A, Burtscher M, Zong Z. Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU. In: Proceedings of the 2013 42Nd International Conference on Parallel Processing. ICPP ’13. Washington, DC, USA: IEEE Computer Society; 2013. p.826–833.

      [15] Boyer M. Improving Resource Utilization in Heterogeneous CPU-GPU Systems. University of Virginia; 2013.

      [16] Mochocki BC, Lahiri K, Cadambi S, Hu XS. Signature-based Workload Estimation for Mobile 3D Graphics. In: Proceedings of the 43rd Annual Design Automation Conference. DAC ’06. New York, NY, USA: ACM; 2006. p. 592–597.

      [17] Choi K, Soma R, Pedram M. Dynamic Voltage and Frequency Scaling Based on Workload Decomposition. In: Proceedings of the 2004 International Symposium on Low Power Electronics and Design. ISLPED ’04. New York, NY, USA: ACM; 2004. p. 174–179.

      [18] Kim S, Kim YJ. GPGPU-Perf: Efficient, Interval-based DVFS Algorithm for Mobile GPGPU Applications. Vis Comput. 2015;31(6-8):1045–1054.

      [19] Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, et al. Rodinia: A Benchmark Suite for Heterogeneous Computing. In: Proceedings of IEEE International Symposium on Workload Characterization (IISWC); 2009. pp. 44–54.

      [20] Khronos. SPIR 1.2 Specification for OpenCL; 2013.

      [21] Lattner C, Adve V. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization. CGO ’04. Washington, DC, USA: IEEE Computer Society; 2004. pp. 75–88.

      [22] Hard Kernel. ODROID-XU3; 2014. Available from: http://www.hardkernel.com/main/products/prdt_info.php?g_code=G140448267127.

      [23] ARM. Mali-T628; 2014. Available from: http://www.arm.com/products/multimedia/mali-performance-efficient-graphics/mali-t628.php.

      [24] Texas Instruments. INA231; 2016. Available from: http://www.ti.com/product/INA231.

      [25] Shen J, Varbanescu AL. A Detailed Performance Analysis of the OpenMP Rodinia Benchmark. Delft University of Technology;. PDS-2011-011

  • Downloads

  • How to Cite

    Kim, S., & Kim, S.-K. (2018). DVFS Algorithms of GPU and Memory for Mobile GPGPU Applications: A case study. International Journal of Engineering & Technology, 7(3), 1918-1925. https://doi.org/10.14419/ijet.v7i3.15296

    Received date: 2018-07-09

    Accepted date: 2018-08-05

    Published date: 2018-08-24