The Akita logo

The 2nd Workshop on
Computer Architecture Modeling and Simulation
(CAMS 2024)

Date: Saturday, November 2, 2024
Time: 8:00 AM CST - 12:00 PM CST
Location: AT&T Hotel and Conference Center, Austin, Texas
Room: 101

The goal of the workshop is to provide a forum for researchers and practitioners to exchange ideas and discuss the latest advances in the field of computer architecture modeling ans simulation. The focus on modeling and simulation techniques is of vital importance to the ongoing advancements in microarchitecture, as these methods are essential tools for improving system performance, efficiency, and reliability.

The workshop will cover various aspects of computer architecture modeling and simulation, including but not limited to:

  • Simulator Development: Advances in design, theory, implementation, and integration of simulators.
  • Performance Modeling: Strategies for prediction, validation, and the impact of architectural features.
  • Power Modeling and Simulation: Methods for power-efficient design and power-performance trade-offs.
  • Tools and Studies Survey: Review and compare existing simulation tools and applications.
  • Scalable Simulation Techniques: Approaches for improving simulation scalability and efficiency.
  • Modeling and Simulation for Unconventional Architectures: Exploration of unique challenges and approaches for emerging and unconventional architectures.
  • Hardware-in-the-loop Simulation: Performance modeling and simulator validation with hardware.
  • Modeling for Machine Learning (Sim4AI): Architectural considerations and models for hardware accelerators.
  • Validation Techniques: Approaches for validating the accuracy of simulation models.
  • Human-centered simulation methods: Analysis, Visualization, Monitoring methods.

Workshop Program

All times are in Central Standard Time (UTC-6).

Time Event
8:00 - 8:10 Opening Remarks
8:10 - 9:00 Keynote
9:00 - 10:00 Paper Talks
9:00 - 9:15 [Paper] Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong (Georgia Institute of Technology), Souvik Kundu (Intel Labs), Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar (Intel) and Tushar Krishna (Georgia Institute of Technology)

Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet Service Level Objectives(SLOs) remains an open research question.

In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. We validate our tool against real hardware data running various different LLM models, achieving a geomean error of 2.73%. We present case studies that provide insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of LLMs across a spectrum of applications.

9:15 - 9:30 [Paper] BottleneckAI: Harnessing Machine Learning and Knowledge Transfer for Detecting Architectural Bottlenecks

Jihyun Ryoo, Gulsum Gudukbay Akbulut, Huaipan Jiang, Xulong Tang, Suat Akbulut, Jack Sampson, Vijaykrishnan Narayanan and Mahmut Taylan Kandemir (The Pennsylvania State University)

The architectural analysis tools that output bottleneck information do not allow knowledge transfer to other applications or architectures. So, we propose a novel tool that can predict a known application's bottlenecks for previously unseen architectures or an unknown application's bottlenecks for known architectures. We (i) identify the bottleneck characteristics of 44 applications and use this as the dataset for our ML/DL model; (ii) identify the correlations between metrics and bottlenecks to create our tool's initial feature list; (iii) propose an architectural bottleneck analysis model - BottleneckAI - that employs random forest regression (RFR) and multi-layer perceptron (MLP) regression; (iv) present results that indicate BottleneckAI tool can achieve 0.70 (RFR) and 0.72 (MLP) R^2 inference accuracy in predicting bottlenecks; (v) present five versions of BottleneckAI, four of which are trained with single architecture data, and one of which is trained with multiple architecture data, to predict bottlenecks for new architectures.

9:30 - 9:45 [Paper] How Accurate is Accurate Enough for Simulators? A Review of Simulation Validation

Shiyuan Li (Oregon State University) and Yifan Sun (The College of William and Mary)

Simulators are vital tools for evaluating the performance of innovative architectural designs. To ensure an accurate simulation results, researchers must validate these simulators. However, even validated simulators can exhibit unreliability when facing new workloads or modified architectural designs. This paper seeks to enhance simulator trustworthiness by refining the validation process. Through a comprehensive review of existing literature, the nuances of simulator accuracy and reliability are examined from a broader perspective on simulation error that goes beyond simple accuracy validation. Our proposals for improving simulator trustworthiness include selecting a representative benchmark set and expanding the configuration set during validation. Additionally, we aim to predict errors associated with new workloads by leveraging the error profiles obtained from the validation process. To further enhance overall simulator trustworthiness, we suggest incorporating error tolerance in the simulator calibration process. Ultimately, we propose additional validation with new benchmarks and minimal calibration, as this approach closely mimics real-world usage environments.

9:45 - 10:00 [Paper] Parallelizing a Modern GPU Simulator

Rodrigo Huerta and Antonio Gonzalez (Universitat Politècnica de Catalunya)

Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This paper demonstrates that simulating some GPGPU workloads in a single-threaded state-of-the-art simulator such as Accel-sim can take more than five days. In this paper we present a simple approach to parallelize this simulator with minimal code changes by using OpenMP. Moreover, our parallelization technique is deterministic, so the simulator provides the same results for single-threaded and multi-threaded simulations. Compared to previous works, we achieve a higher speed-up, and, more importantly, the parallel simulation does not incur any inaccuracies. When we run the simulator with 16 threads, we achieve an average speed-up of 5.8x and reach 14x in some workloads. This allows researchers to simulate applications that take five days in less than 12 hours. By speeding up simulations, researchers can model larger systems, simulate bigger workloads, add more detail to the model, increase the efficiency of the hardware platform where the simulator is run, and obtain results sooner.

10:00 - 10:30 Coffee break
10:30 - 12:00 Simulator Release Talks
10:30 - 10:50 What's new in gem5 24.0 (Jason Lowe-Power)

In this talk, we will explore the significant advancements and new features introduced in gem5 v24.0 over the past five years. We will discuss the development of a robust and inclusive community. Key updates include the introduction of a standard library for simplified simulation setup, the implementation of the CHI coherence protocol for enhanced cache hierarchy configurability, and support for full system machine learning stacks using unmodified ML frameworks like PyTorch and TensorFlow.

10:50 - 11:10 Release of Sniper v8.1 and Guide on Common Simulation Practices (Alen Sabu, Trevor E. Carlson)

In this talk, we will introduce the latest release of Sniper, version 8.1. This Sniper release includes support for Pac-Sim, a sampled simulation technique suitable for dynamically scheduled multi-threaded workloads. Pac-Sim eliminates the need for upfront profiling, allowing users to simulate large multi-threaded workloads more efficiently. Further, we release a document that assists computer architects and practitioners with selecting the right tools for their performance evaluation studies. We hope the document to be the starting point for any simulation-based research in computer architecture.

11:10 - 11:30 User-Friendly Tools in Akita (Yifan Sun)

In this talk, we will present the real-time monitoring tool for Akita---AkitaRTM---and the default trace visualization tool for Akita---Daisen.

11:30 - 11:50 SST 14.1 Highlights (Patrick Lavin)

In this talk, we will cover the improvements made to the Structural Simulation Toolkit over the past several years. We will look at improvements made to the parallel core, most notably checkpoint/restart, as well as additions to the included simulation components such as Merlin, a network simulator, and Mercury, a large-scale application model. We will also share work done to help new users, including a new documentation website and an interactive utility for learning about simulation components.

11:50 - 12:00 Closing Remarks

Keynotes

Speaker: Matt Sinclair, University of Wisconsin-Madison
Title: Reducing the GAP: Improving the Fidelity and Scalability of gem5’s GPU Models

Abstract: The breakdown in Moore’s Law and Dennard Scaling is leading to drastic changes in the makeup and constitution of computing systems. For example, a single chip integrates 10-100s of cores and has a heterogeneous mix of general-purpose compute engines and highly specialized accelerators. Traditionally, computer architects have relied on tools like architectural simulators to accurately perform early-stage prototyping and optimizations for the proposed research. However, as systems become increasingly complex and heterogeneous, architectural tools are straining to keep up. In particular, publicly available architectural simulators are often not very representative of the industry parts they intend to represent. This leads to a mismatch in expectations; when prototyping new optimizations researchers may draw the wrong conclusions about the efficacy of proposed optimizations if the tool’s models do not provide high fidelity. Moreover, modeling and simulation tools are also struggling to keep pace with increasingly large, complex workloads from domains such as machine learning (ML).

In this talk, I will discuss our work on improving the open source, publicly available GPU models in the widely used gem5 simulator. gem5 can run entire systems, including CPUs, GPUs, and accelerators, as well as the operating system, runtime, network, and other related components. Thus, gem5 has the potential to allow users to study the behavior of the entire heterogeneous systems. Unfortunately, some of gem5’s publicly available models do not always provide high accuracy relative to their ”real” counterparts, especially for the memory subsystem. I will discuss my group's efforts to overcome these challenges and improve the fidelity of gem5's GPU models, as well as our ongoing efforts to scalably run modern ML and HPC workloads in frameworks such as PyTorch and TensorFlow in gem5. Collectively, this work significantly enhances the state-of-the-art and enables more widespread adoption of gem5 as an accurate platform for heterogeneous architecture research.

Bio: I am an Assistant Professor in the Computer Sciences Department at the University of Wisconsin-Madison. I am also an Affiliate Faculty in the ECE Department and Teaching Academy at UW-Madison. My research primarily focuses on how to design, program, and optimize future heterogeneous systems. I also design the tools for future heterogeneous systems, including serving on the gem5 Project Management Committee and the MLCommons Power and HPC Working Groups. I am a recipient of the NSF CAREER award, and my work has been funded by the DOE, Google, NSF, and SRC. My research has also been recognized several times, including an ACM Doctoral Dissertation Award nomination, a Qualcomm Innovation Fellowship, the David J. Kuck Outstanding PhD Thesis Award, and an ACM SIGARCH - IEEE Computer Society TCCA Outstanding Dissertation Award Honorable Mention. I am also the current steward for the ISCA Hall of Fame.



Call for Papers

The workshop invites submissions of original work in the form of full papers (up to 6 pages, reference not included) covering all aspects of computer architecture modeling and simulation. Submissions will be peer-reviewed, and accepted papers will be included in the workshop proceedings.

Important Dates

  • Papers Due:

    August 16, 2024

    August 30, 2024 (Anywhere on Earth)

  • Author Notification:

    September 15, 2024

    September 23, 2024

Submission Guidelines

Full paper submissions must be in PDF format for US letter-size or A4 paper. They must not exceed 6 pages (excluding unlimited references) in standard ACM two-column conference format (review mode, with page numbers and both 9 or 10pt can be used). More concise papers with ideas clearly expressed are also welcomed. Authors can select if they want to reveal their identity in the submission. Templates for ACM format are available for Microsoft Word and LaTeX at this link. https://www.acm.org/publications/proceedings-template

We do not put the paper in the ACM or IEEE digital libraries. Therefore, the papers submitted to the event can be submitted to other venues without restrictions.

At least one author of accepted papers is expected to present in person during the event. We understand the travel difficulty of the post-pandemia era. In extreme cases, we will allow remote or pre-recorded presentations.

Submission Site: https://easychair.org/conferences/?conf=cams2024

Workshop Organizers

Yifan Sun Trevor E. Carlson Sabila Al Jannat
Chair Chair Web Chair
William & Mary National University of Singapore William & Mary
Please contact the organizers if you have any questions.

Program Committee

In this workshop, we are experimenting with a PhD and practitioner-led PC. We believe that PhD students and practitioners are the end users of simulation and performance modeling tools and hence, should know the tools the best. We will report our experience during the workshop event.

  • Yuhui Bao (Northeastern University)
  • Ying Li (William & Mary)
  • Changxi Liu (National University of Singapore)
  • Patrick Lavin (Sandia National Lab)
  • Mohammadreza Rezvani (UC Riverside)
  • Mahyar Samani (UC Davis)
  • William Won (Georgia Institute of Technology)