- - -
SC'14
Carri Systems == HPC

Concurrent-AMAT: a mathematical model for Big Data access

  |  May 12, 2014

In today’s high-end computing systems, data access delay is the preeminent performance bottleneck that prevents the improvement of application performance. Apparently, a solution has been found for this outstanding problem. The newly proposed C-AMAT (concurrent-AMAT) model provides a novel approach which promises to significantly improve data access speed.

Dr Xian-He Sun,
Chairman and Distinguished Professor,
Department of Computer Science,
Illinois Institute of Technology

High-speed computing is the design goal of computers, while data is the object of computing. As technologies advance, both computing speed and data access speed have been increasing through the years. However, computing speed is increasing much faster than memory access speed. As a result, the performance gap between CPU and memory grows larger and larger. Data access delay becomes the main bottleneck preventing the performance improvement of a computing system. That is the (in)famous memory-wall problem.

We were the first group of researchers to notice the importance of memory system performance. In 1990, we proposed the memory-bounded model [2], which is also known as Sun-Ni’s law. The memory-bounded model revealed that data access is the key factor of performance, and scalable computing is bounded by memory capacity. Sun-Ni’s law, along with Amdahl's law and Gustafson's law, is now known as one of the three basic laws of scalable computing, included in many parallel computing textbooks. In 1994, the term "memory wall" was formally introduced based on the Average Memory Access Time (AMAT) model [3]. From a single memory access viewpoint, memory wall once again emphasized that computer performance is limited by data access latency, and called for improving memory performance. Twenty years of responding to the memory-bound and memory-wall problem followed, and intensive research has been conducted to improve memory system performance. Today, modern microprocessors such as the Pentium Pro, Alpha 21164, Strong Arm SA110, and Longson-3A use 80% or more of their transistors for the on-chip cache rather than computing components. Many advanced memory technologies have been developed, including many concurrent memory technologies.

Conventionally, memory performance is evaluated by AMAT. However, AMAT was proposed 50 years ago and only considers the locality of data accesses, without considering memory concurrency. It has fatal deficiencies for the evaluation of modern memory systems, where concurrency is common and becomes more and more the driving force of improving memory performance. Despite the lack of capacity to measure concurrency, without an alternative, AMAT is still the current community standard of both industry and academia in evaluating memory systems. Hennessy and Patterson’s classical computer architecture textbook also introduced AMAT this way [4].

The shortcoming of AMAT in concurrency awareness is well recognized by researchers and practitioners. A new metric, named MLP (Memory Level Parallelism) [5], was proposed to catch the characteristic of concurrent memory access. MLP can measure concurrency, but cannot measure locality. In practice, MLP has to be used with AMAT together. How to use them together appropriately and accurately is application dependent, and not trivial. A more serious problem is that MLP is a measurement, but AMAT is an analysis tool. When applying MLP on AMAT, the three parameters of AMAT - H, MR, AMP - have lost their physical meanings, and cannot be used for performance analysis and optimization anymore. In other words, while MLP can measure concurrent data access in some way, it cannot be used for performance analysis and cannot be bundled with AMAT to extend AMAT to concurrent data accesses. For this reason, Hennessy and Patterson still use AMAT as the standard memory evaluation tool in their textbook, but in the meantime also point out the weakness of AMAT [4].

[References]

[1] X. H. Sun, D. Wang, Concurrent Average Memory Access Time, in IEEE Computer, May 2014, IEEE Computer digital print (Digital Object Identifier : 10.1109/MC.2013.227).

[2] X. H. Sun and L. M. Ni, Another view on parallel speedup in Proceedings of IEEE Supercomputing Conference, Nov. 1990, pp. 324-333.

[3] W. A. Wulf and S. A. McKee, Hitting the memory wall: implications of the obvious, ACM SIGARCH Computer Architecture News, vol. 23, pp. 20-24, 1995.

[4] J. L. Hennessy and D. A. Patterson, Computer Architecture: A quantitative approach (5th edition): Elsevier, 2012.

[5] Y. Chou, B. Fahs, and S. Abraham, Microarchitecture Optimizations for Memory-Level Parallelism in Proceedings of the 31st International Symposium on Computer Architecture, June 2004.

 

Published in: State of the Art

Go to the top of the page | back to the section headlines | back to the home page

Cray CS300
The Supercomputing 2014 Conference