| AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding |
Zikun Li (Carnegie Mellon University), Zhuofu Chen (Princeton University), Remi Delacourt (EPFL),
Gabriele Oliaro (Carnegie Mellon University), Zeyu Wang (Carnegie Mellon University),
Qinghan Chen (Carnegie Mellon University), Shuhuai Lin (Carnegie Mellon University),
April Yang (Carnegie Mellon University), Zhihao Zhang (Carnegie Mellon University),
Zhuoming Chen (Carnegie Mellon University), Yi-Hsiang Lai (Amazon Web Services),
Xinhao Cheng (Carnegie Mellon University), Xupeng Miao (Purdue University),
Zhihao Jia (Carnegie Mellon University and Amazon Web Services)
|
| FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters |
Yanying Lin (UCAS, UCSD), Shijie Peng (Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences),
Chengzhi Lu (University of Macau), ChengZhong Xu (University of Macau),
Kejiang Ye (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)
|
| Learn-to-Probe: Achieving Signal Distinguishability in Learning-based Congestion Control |
Han Tian (University of Science and Technology of China), Wenbo Li (University of Science and Technology of China),
Junxue Zhang (University of Science and Technology of China), Xudong Liao (Hong Kong University of Science and Technology),
Decang Sun (Hong Kong University of Science and Technology), Donghui Chen (Huawei Technologies Co., Ltd.),
Bin Huang (Huawei Technologies Co., Ltd.), Wenxue Li (Hong Kong University of Science and Technology),
Yong Wang (Hong Kong University of Science and Technology), Kai Chen (Hong Kong University of Science and Technology)
|
| iRoute: Local Routing Table-based Workflow Management in Serverless Computing |
Yiming Li (Tianjin University), Laiping Zhao (Tianjin University), Zhiyuan Su (IEIT SYSTEMS CO., LTD.),
Guowei Liu (Tianjin University), Wenhao Huang (Tianjin University), Kang Chen (Tsinghua University),
Zhaolin Duan (Tianjin University), Jingjie Zong (Tianjin University), Wenxin Li (Tianjin University),
Deze Zeng (China University of Geosciences), Dong Zhang (Inspur (Jinan) Data Technology Co., Ltd),
Wenyu Qu (Tianjin University)
|
| Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading |
Hanfei Yu (Stevens Institute of Technology), Xingqi Cui (Rice University), Hong Zhang (University of Waterloo),
Hao Wang (Rutgers University), Hao Wang (Stevens Institue of Technology)
|
| REPS: Recycled Entropy Packet Spraying for Adaptive Load Balancing and Failure Mitigation |
Tommaso Bonato (ETH Zürich / Microsoft), Abdul Kabbani (Microsoft), Ahmad Ghalayini (Microsoft),
Michael Papamichael (Microsoft), Mohammad Dohadwala (Microsoft), Lukas Gianinazzi (ETH Zürich),
Mikhail Khalilov (ETH Zürich), Elias Achermann (ETH Zürich), Daniele De Sensi (Sapienza University of Rome),
Torsten Hoefler (ETH Zürich / Microsoft)
|
| TierScape: Harnessing Multiple Compressed Tiers to Tame Server Memory TCO |
Sandeep Kumar (Intel Labs), Aravinda Prasad (Intel Labs), Sreenivas Subramoney (Intel Labs)
|
| Handling Network Faults in Distributed AI Training: Failover is Now an Option |
Xin Zhe Khooi (National University of Singapore), Zhuo Jiang (ByteDance), Pan Xie (ByteDance),
Zhigang Cui (ByteDance), Meng Wang (ByteDance), Yuze Jin (National University of Singapore),
Pengfei Huo (ByteDance), Dongyang Wang (ByteDance), Lulu Chen (ByteDance), Lei Wang (ByteDance),
Liaoyuan Feng (ByteDance), Xiaodong Liu (ByteDance), Peng Li (ByteDance), Qinlong Wang (ByteDance),
Yang Bai (ByteDance), Yongcan Wang (ByteDance), Hao Jin (ByteDance), Jinshuai Sun (ByteDance),
Shan Lu (ByteDance), Xiang Shi (ByteDance), Yingkai Zhao (ByteDance), Haiquan Chen (ByteDance),
Yi Li (ByteDance), Jianxi Ye (ByteDance), Mun Choon Chan (National University of Singapore)
|
| Chimera: Transparent and High-Performance ISAX Heterogeneous Computing via Binary Rewriting |
Jiatai He (Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences),
Qinglin Pan (Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences),
Ruilin Zhao (Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences),
Ji Qi (Institute of Software Chinese Academy of Sciences; Key Laboratory of System Software (Chinese Academy of Sciences)),
Kaiwen Liang (Hohai University; Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences, Nanjing; Nanjing Institute of Software Technology),
Jiahao Xu (Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences),
Zhiyuan Li (University of Chinese Academy of Sciences; Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences, Nanjing; Nanjing Institute of Software Technology),
Yuexiang Wang (Institute of Software Chinese Academy of Sciences),
Jiageng Yu (Institute of Software Chinese Academy of Sciences),
Yanjun Wu (Institute of Software Chinese Academy of Sciences)
|
| Pyramid: A Secure, Resource-Efficient, and Pluggable Kubernetes for Multi-Tenancy |
Xiang Li (Tsinghua University, China Telecom eSurfing Cloud (State Cloud)),
Weijie Liu (Nankai University), Fabing Li (Ant Group), Hongliang Tian (Ant Group),
Zheli Liu (Nankai University), Shoumeng Yan (Ant Group),
Mingyu Gao (Tsinghua University, Shanghai Qi Zhi Institute)
|
| MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production |
Chao Jin (School of Computer Science, Peking University), Ziheng Jiang (ByteDance Seed),
Zhihao Bai (ByteDance Seed), Zheng Zhong (ByteDance Seed), Juncai Liu (ByteDance Seed),
Xiang Li (ByteDance Seed), Ningxin Zheng (ByteDance Seed), Xi Wang (ByteDance Seed),
Cong Xie (ByteDance Seed), Qi Huang (ByteDance Seed), Wen Heng (ByteDance Seed),
Yiyuan Ma (ByteDance Seed), Wenlei Bao (ByteDance Seed), Size Zheng (ByteDance Seed),
Xuegui Zheng (ByteDance Seed), Yanghua Peng (ByteDance Seed), Haibin Lin (ByteDance Seed),
Xuanzhe Liu (School of Computer Science, Peking University), Xin Jin (School of Computer Science, Peking University),
Xin Liu (ByteDance Seed)
|
| Effective On-Hardware Fuzzing of Embedded Operating Systems |
Yuheng Shen (Tsinghua University), Jianzhong Liu (Shandong University), Qiming Guo (Beihang University),
Yifei Chu (Tsinghua University), Qiang Zhang (Hunan University), Heyuan Shi (Central South University),
Yu Jiang (Tsinghua University)
|
| GeDES: GPU-Driven Discrete Event Network Simulator |
Qinyong Li (University of Electronic Science and Technology of China), Zhiwei Zhao (University of Electronic Science and Technology of China),
Geyong Min (Department of Computer Science, University of Exeter), Zi Wang (University of Electronic Science and Technology of China),
Luwei Fu (University of Electronic Science and Technology of China)
|
| TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling |
Junyi Chen (Shanghai Jiao Tong University), Chuheng Du (Shanghai Jiao Tong University),
Renyuan Liu (George Mason University), Shuochao Yao (George Mason University),
Dingtian Yan (China Telecom Corporation Limited Shanghai Branch), Jiang Liao (China Telecom Corporation Limited Shanghai Branch),
Shengzhong Liu (Shanghai Jiao Tong University), Fan Wu (Shanghai Jiao Tong University),
Guihai Chen (Shanghai Jiao Tong University)
|
| Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices |
Fahao Chen (Shandong University, China, Xi'an Jiaotong University, China), Jie Wan (Xi'an Jiaotong University, China),
Peng Li (Xi'an Jiaotong University, China), Zhou Su (Xi'an Jiaotong University, China),
Dongxiao Yu (Shandong University, China)
|
| Multipath Collective Communication Beyond Scale-up Networks in GPU Clouds |
Yuchen Xu (Peking University), Jianglong Nie (Peking University), Baojia Li (Tencent),
Mingzhuo Chen (Tencent), Hao Lu (Tencent), Guanyu Qu (Tencent), Zhenchuan Liu (Tencent),
Shuangshaung Yin (Tencent), Xiaojie Huang (Tencent), Chunzhi He (Tencent), Yinben Xia (Tencent),
Quan Wen (Tencent), Xiang Li (Tencent), Zekun He (Tencent), Yachen Wang (Tencent),
Xianneng Zou (Tencent), Congcong Miao (Tencent), Wenfei Wu (Peking University)
|
| LoRAFusion: Efficient LoRA Fine-Tuning for LLMs |
Zhanda Zhu (University of Toronto, Vector Institute, NVIDIA), Qidong Su (University of Toronto, Vector Institute, NVIDIA),
Yaoyao Ding (University of Toronto, Vector Institute, NVIDIA), Kevin Song (University of Toronto, Vector Institute, NVIDIA),
Shang Wang (University of Toronto, Vector Institute, NVIDIA), Gennady Pekhimenko (University of Toronto, Vector Institute, NVIDIA)
|
| SKernel: An Elastic and Efficient Secure Container System at Scale with a Split-Kernel Architecture |
Xiaohu Chai (Tsinghua University; Ant Group), Keyang Hu (Tsinghua University), Jianfeng Tan (Ant Group),
Tiwei Bie (Ant Group), Guotao Tan (Ant Group), Tianyu Zhou (Ant Group), Anqi Shen (Ant Group),
Dawei Shen (Ant Group), Xinyao Yang (Ant Group), Xin Chen (Ant Group), Xu Wang (Ant Group),
Feng Yu (Ant Group), Zhengyu He (Ant Group), Dong Du (Shanghai Jiao Tong University),
Yubin Xia (Shanghai Jiao Tong University), Kang Chen (Tsinghua University),
Yu Chen (Quan Cheng Laboratory, Jinan, China; Tsinghua University)
|
| Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings
|
Pierre Jacquet (École de technologie supérieure), Maxime Agusti (Univ Lyon1, Inria, OVHcloud), Eddy Caron (Univ Lyon1, Inria, ENS de Lyon, CNRS),
Camille Coti (École de technologie supérieure), Marcos Dias De Assunção (École de technologie supérieure),
Laurent Lefèvre (Univ Lyon1, Inria, ENS de Lyon, CNRS), Anne-Cécile Orgerie (CNRS, IRISA, Rennes - France)
|
| TZ-LLM: Protecting On-Device Large Language Models with Arm TrustZone |
Xunjie Wang (Shanghai Jiao Tong University), Jiacheng Shi (Shanghai Jiao Tong University),
Zihan Zhao (Shanghai Jiao Tong University), Yang Yu (Shanghai Jiao Tong University),
Zhichao Hua (Shanghai Jiao Tong University), Jinyu Gu (Shanghai Jiao Tong University)
|
| STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning |
Zixiao Huang (Tsinghua University), Junhao Hu (Infinigence-AI), Hao Lin (Tsinghua University),
Chunyang Zhu (Infinigence-AI), Yueran Tang (Infinigence-AI), Quanlu Zhang (Infinigence-AI),
Zhen Guo (Infinigence-AI), Zhenhua Li (Tsinghua University), Shengen Yan (Tsinghua University),
Zhenhua Zhu (Tsinghua University), Guohao Dai (Shanghai Jiao Tong University),
Yu Wang (Tsinghua University)
|
| Efficient Data Passing for Serverless Inference Workflows: A GPU-Centric Approach |
Hao Wu (HUST), Yaochen Liu (HUST), Minchen Yu (CUHK-Shenzhen), Qizhen Weng (TeleAI),
Junxiao Deng (HUST), Yue Yu (HUST), Hao Fan (HUST), Song Wu (HUST),
Wei Wang (HKUST), Hai Jin (HUST)
|
| PatternSketch: General and Runtime Reconfigurable Time-series Network Traffic Pattern Detection |
Yang Du (Soochow University, Suzhou, China), Dan Wang (Soochow University, Suzhou, China),
He Huang (Soochow University, Suzhou, China), Hanwen Zhang (Soochow University, Suzhou, China),
Jianzhi Tang (Soochow University, Suzhou, China), Fu Xiao (Nanjing University of Posts and Telecommunications, Nanjing, China),
Yu-E Sun (Soochow University, Suzhou, China)
|
| Concord: Learning Network Configuration Contracts |
Ryan Beckett (Microsoft Research), Francis Y. Yan (University of Illinois Urbana-Champaign),
Raghunadha Reddy Pocha (Microsoft), Vineesh V. Raj (Microsoft), Ayyub Shaik (Microsoft),
Siva Kesava Reddy Kakarla (Microsoft Research)
|
| LLMFolder: Revisiting Constant Folding in Large Language Models |
Gansen Hu (Shanghai Jiao Tong University), Zhaoguo Wang (Shanghai Jiao Tong University),
Wei Huang (Shanghai Jiao Tong University), Jinglin Wei (Shanghai Jiao Tong University),
Haibo Chen (Shanghai JiaoTong University)
|
| Ethane: Debloating State Data using Compact Trie for Account-based Blockchain |
Junmo Lee (Seoul National University), Jaehun Kim (Seoul National University),
Jiyong Youn (Seoul National University), Soo-Mook Moon (Seoul National University)
|
| Scheduling Cloud Block Storage Proactively and Reactively with Omar |
Xinqi Chen (Shanghai Jiao Tong University), Weidong Zhang (Alibaba Group), Zhongyu Wang (Alibaba Group),
Erci Xu (Shanghai Jiao Tong University), Xiaolu Zhang (Alibaba Group), Dong Wu (Alibaba Group),
Junping Wu (Alibaba Group), Haonan Wu (Shanghai Jiao Tong University), Ruiming Lu (Shanghai Jiao Tong University),
Yaheng Song (Alibaba Group), Chaolei Hu (Alibaba Group), Lijun Ding (Alibaba Group),
Guangtao Xue (Shanghai Jiao Tong University), Patrick P. C. Lee (The Chinese University of Hong Kong)
|
| OptiLog: Assigning Roles in Byzantine Consensus |
Hanish Gogada (University of Stavanger), Christian Berger (Friedrich-Alexander-Universität Erlangen-Nürnberg),
Leander Jehl (University of Stavanger), Hans P. Reiser (Reykjavik University),
Hein Meling (University of Stavanger) |
| FUR: Fast and Unlimited Reads on Persistent Memory Transactions
|
João Barreto (INESC-ID, IST, Universidade de Lisboa, Portugal),
Daniel Castro (INESC-ID, IST, Universidade de Lisboa, Portugal),
Paolo Romano (INESC-ID, IST, Universidade de Lisboa, Portugal),
Alexandro Baldassin (São Paulo State University (Unesp), Brazil)
|
| Rearchitecting Programmable Networks For In-Network Computing: From Hardware To Language |
Haifeng Sun (National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University),
Bing Liu (Huawei Cloud Computing Technologies Co., Ltd.), Taixu Tian (Huawei Technologies Co., Ltd),
Jinbo Sun (Institute of Computing Technology, CAS),
Jintao He (National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University),
Qun Huang (National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University),
Luyou He (Huawei Cloud Computing Technologies Co., Ltd.),
Xuan Wang (Huawei Technologies Co., Ltd),
Feng Gao (Huawei Cloud Computing Technologies Co., Ltd.),
Liguo Wang (Huawei Technologies Co., Ltd),
Xiangcan Xu (National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University),
Junyi Guo (National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University),
Xiaoping Zhu (Huawei Cloud Computing Technologies Co., Ltd.),
Yongqiang Yang (Huawei Cloud Computing Technologies Co., Ltd.)
|
| AdaGen: Workload-Adaptive Cluster Scheduler for Latency-Optimal LLM Inference Serving |
Sudipta Saha Shubha (University of Virginia), Ayush Goel (HPE Labs),
Diman Zad Tootaghaj (HPE Labs), Khaled Diab (HPE Labs), Hardik Soni (HPE Labs),
K. K. Ramakrishnan (University of California, Riverside), Puneet Sharma (HPE Labs),
Haiying Shen (University of Virginia)
|
| TailorLLM: Collaborative End-Cloud Inference of Large and Small Language Models Based on Low-Rank Adaptation |
Zian Wang (Beijing University of Posts and Telecommunications), Ziyi Wang (Beijing University of Posts and Telecommunications),
Haonan Jin (Beijing University of Posts and Telecommunications), Jie Xing (Beijing University of Posts and Telecommunications),
Lanshan Zhang (Beijing University of Posts and Telecommunications)
|
| NecoFuzz: Effective Fuzzing of Nested Virtualization via Fuzz-Harness Virtual Machines |
Reima Ishii (The University of Tokyo), Takaaki Fukai (National Institute of Advanced Industrial Science and Technology),
Takahiro Shinagawa (The University of Tokyo)
|
| KUNSERVE: Parameter-centric Memory Management for Efficient Memory Overloading Handling in LLM Serving |
Rongxin Cheng (Shanghai Jiao Tong University), Yuxin Lai (Shanghai Jiao Tong University),
Xingda Wei (Shanghai Jiao Tong University), Rong Chen (Shanghai Jiao Tong University),
Haibo Chen (Shanghai JiaoTong University)
|
| SwitchFS: Asynchronous Metadata Updates for Distributed Filesystems with In-Network Coordination |
Jingwei Xu (Shanghai Jiao Tong University), Mingkai Dong (Shanghai Jiao Tong University),
Qiulin Tian (Shanghai Jiao Tong University), Ziyi Tian (Shanghai Jiao Tong University),
Tong Xin (Shanghai Jiao Tong University), Haibo Chen (Shanghai Jiao Tong University)
|
| DROPS: Managing Serverless Resource Pools in Microsoft Azure Functions |
Ahmed Alquraan (University of Waterloo), Abdelrahman Baba (University of Waterloo),
Rafael Mendes (Microsoft Research), Sameh Elnikety (Microsoft Research),
Paul Batum (Microsoft), Yan Chen (Microsoft), Hamid Henry Safi (Microsoft),
Seth Fine (Microsoft), Samer Al-Kiswany (University of Waterloo)
|
| FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks |
Jaemin Kim (Seoul National University), Hongjun Um (Hanyang University), Sungkyun Kim (Hanyang University),
Yongjun Park (Yonsei University), Jiwon Seo (Seoul National University)
|
| Turnstile: Hybrid Information Flow Control Framework for Managing Privacy in Internet-of-Things Applications |
Kumseok Jung (University of British Columbia), Mohanna Shahrad (Princeton University),
Gargi Mitra (University of British Columbia), Karthik Pattabiraman (University of British Columbia)
|
| SkyWalker: A Locality-Aware Cross-Region Load Balancer for LLM Inference |
Tian Xia (UC Berkeley), Ziming Mao (UC Berkeley), Jamison Kerney (UC Berkeley),
Ethan J. Jackson (UC Berkeley), Zhifei Li (Renmin University of China),
Jiarong Xing (Rice University and UC Berkeley), Scott Shenker (ICSI and UC Berkeley),
Ion Stoica (UC Berkeley)
|
| PASS: A Power Adaptive Storage Server |
Dedong Xie (University of Washington), Theano Stavrinos (Databricks), Jonggyu Park (University of Washington),
Simon Peter (University of Washington), Baris Kasikci (University of Washington), Thomas E. Anderson (University of Washington)
|
| MFS: An Efficient Model Family Serving System for LLMs |
Yunxuan Zhang (Hong Kong University of Science and Technology), Hao Wang (Hong Kong University of Science and Technology),
Han Tian (University of Science and Technology of China), Liu Yang (Hong Kong University of Science and Technology),
Xudong Liao (Hong Kong University of Science and Technology), Wenxue Li (Hong Kong University of Science and Technology),
Ping Yin (Inspur), Bowen Liu (Hong Kong University of Science and Technology), Kai Chen (Hong Kong University of Science and Technology)
|
| LightDSA: Enabling Efficient DSA Through Hardware-Aware Transparent Optimization |
Yuansen Wang (Renmin University of China), Teng Ma (Alibaba Group), Yuanhui Luo (Renmin University of China),
Dongbiao He (CNIC, CAS), Zheng Liu (Alibaba Group), Yunpeng Chai (Renmin University of China)
|
| Squeezy: Rapid VM Memory Reclamation for Serverless Functions |
Orestis Lagkas Nikolos (National Technical University of Athens, Athens, Greece),
Chloe Alverti (University of Illinois Urbana-Champaign, Illinois, USA),
Stratos Psomadakis (National Technical University of Athens, Athens, Greece),
Georgios Goumas (National Technical University of Athens, Athens, Greece),
Nectarios Koziris (National Technical University of Athens, Athens, Greece)
|
| FlexiWalker: Extensible GPU Framework for Efficient Dynamic Random Walks with Runtime Adaptation |
Seongyeon Park (Seoul National University), Jaeyong Song (Seoul National University),
Changmin Shin (Seoul National University), Sukjin Kim (Seoul National University),
Junguk Hong (Seoul National University), Jinho Lee (Seoul National University)
|
| PaCaR: Improved Buffered I/O Locality on NUMA Systems with Page Cache Replication |
Jérôme Coquisart (RWTH Aachen University), Julien Sopena (LIP6 (UPMC/CNRS) - Inria),
Redha Gouicem (RWTH Aachen University)
|
| RLive: Robust Delivery System for Scaling Live Streaming Services |
Yu Tian (Institute of Computing Technology, University of Chinese Academy of Sciences),
Gerui Lv (ICT, CAS), Qinghua Wu (ICT/CAS), Ruili Fang (ByteDance), Yajie Peng (ByteDance),
Zhichen Xue (Bytedance), Rui Han (ByteDance Ltd. Beijing),
Chuanqing Lin (ICT, CAS; UCAS, China), Xiaofei Pang (ByteDance), Ri Lu (ByteDance),
Zhenyu Li (ICT, CAS)
|
| Carbon-Aware Continuous Learning for Sustainable Real-Time Machine Learning Analytics |
Gwanjong Park (Sungkyunkwan University), Osama Khan (Sungkyunkwan University),
Dongho Ha (Unaffiliated), Myeongjae Jeon (POSTECH), Euiseong Seo (Sungkyunkwan University)
|
| Canopy: Property-Driven Learning for Congestion Control |
Chenxi Yang (UT Austin), Divyanshu Saxena (UT Austin), Rohit Dwivedula (UT Austin),
Kshiteej Mahajan (Google DeepMind), Swarat Chaudhuri (UT Austin), Aditya Akella (UT Austin)
|
| Towards Improving Throughput and Scalability of DAG-based BFT SMR |
Nibesh Shrestha (Supra Research), Aniket Kate (Supra Research / Purdue University)
|
| SAS: Sparse Attention Synthesizer for Efficient Language Model Inference |
Yuan Zhou (Amazon), Shaojie Xiang (Amazon), Lingfan Yu (Amazon), Zhenyu Song (Amazon),
Charith Mendis (Amazon), Yida Wang (Amazon)
|
| Gopher: Efficient Dynamic Graph Pattern Mining via DAG-Driven Execution |
Yi Zhang (Huazhong University of Science and Technology, China), Yu Huang (Huazhong University of Science and Technology, China),
Chaoqiang Liu (Huazhong University of Science and Technology, China), Haifeng Liu (Huazhong University of Science and Technology, China),
Juntao Chen (Huazhong University of Science and Technology, China), Jingrui Yuan (Huazhong University of Science and Technology, China),
Jianhui Yue (Michigan Technological University, USA), XIAOFEI LIAO (Huazhong University of Science and Technology, China),
Hai Jin (Huazhong University of Science and Technology, China), Jingling Xue (University of New South Wales, Australia)
|
| Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation |
Srihas Yarlagadda (Georgia Institute of Technology), Amey Agrawal (Georgia Institute of Technology),
Elton Pinto (Georgia Institute of Technology), Hakesh Darapaneni (Georgia Institute of Technology),
Mitali Meratwal (Georgia Institute of Technology), Shivam Mittal (Georgia Institute of Technology),
Pranavi Bajjuri (Georgia Institute of Technology), Srinivas Sridharan (Nvidia Inc.),
Alexey Tumanov (Georgia Institute of Technology)
|
| Serverless Replication of Object Storage across Multi-Vendor Clouds and Regions |
Junyi Shu (School of Computer Science, Peking University and UCLA), Xiaolong Huang (School of Computer Science, Peking University),
Gang Huang (School of Computer Science, Peking University), Hong Mei (School of Computer Science, Peking University),
Xuanzhe Liu (School of Computer Science, Peking University), Xin Jin (School of Computer Science, Peking University)
|
| Fast and Parallelized Crash Consistency with Opportunistic Order Elimination |
Jiahao Chen (Harbin Institute of Technology, Shenzhen), Yanqi Pan (Harbin Institute of Technology, Shenzhen),
Wen Xia (Harbin Institute of Technology, Shenzhen), Hao Huang (Harbin Institute of Technology, Shenzhen),
Peixin Zeng (Harbin Institute of Technology, Shenzhen), Yuchen Shan (Harbin Institute of Technology, Shenzhen)
|
| Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training |
Chang Chen (Peking University), Tiancheng Chen (ETH Zurich), Jiangfei Duan (CUHK),
Qianchao Zhu (Peking University), Zerui Wang (Shanghai AI Laboratory), Qinghao Hu (MIT),
Peng Sun (Shanghai AI Laboratory), Xiuhong Li (Peking University), Chao Yang (Peking University),
Torsten Hoefler (ETH Zurich)
|
| Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering |
Ke Hong (Tsinghua University), Xiuhong Li (Peking University), Minxu Liu (Infinigence-AI),
Qiuli Mao (Infinigence-AI), Tianqi Wu (Tsinghua University), Zixiao Huang (Tsinghua University),
Lufang Chen (Infinigence-AI), Zhong Wang (Tsinghua University), Yichong Zhang (Tsinghua University),
Zhenhua Zhu (Tsinghua University), Guohao Dai (Shanghai Jiao Tong University), Yu Wang (Tsinghua University)
|
| MTTM: Dynamic Fast Memory Partitioning with Bandwidth Optimization for Multi-tenant Cloud |
Changjun Lee (KAIST), Sangjin Choi (KAIST), Youngjin Kwon (KAIST)
|
| BASK: Batch And SmartNIC-offloaded KSM |
Chanshin Kwak (KAIST), Jaehyeon Lee (KAIST), Minkyu Jung (KAIST),
Changjun Lee (KAIST), Youngjin Kwon (KAIST)
|
| CSnake: Detecting Self-Sustaining Cascading Failure via Causal Stitching of Fault Propagations |
Shangshu Qian (Purdue University), Lin Tan (Purdue University), Yongle Zhang (Purdue University)
|
| Demystifying Serverless Costs on Public Platforms: Bridging Billing, Architecture, and OS Scheduling |
Changyuan Lin (University of British Columbia), Yuanzhi Ma (Johns Hopkins University),
Mohammad Shahrad (University of British Columbia)
|
| E-Cube: Event Enhanced Efficient Video Streaming for Drones |
Jingao Xu (The University of Hong Kong), Longfei Shangguan (University of Pittsburgh),
Danyang Li (Tsinghua University), Yunhao Liu (Tsinghua University), Zheng Yang (Tsinghua University)
|
| MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing |
Rahma Nouaji (McGill University, Canada),
Stella Bitchebe (McGill University, Canada),
Ricardo Macedo (INESC TEC & University of Minho, Portugal),
Oana Balmau (McGill University, Canada)
|
| In-Production Characterization of an Open Source Serverless Platform and New Scaling Strategies |
Nima Nasiri (University of British Columbia),
Nalin Munshi (University of British Columbia),
Simon D Moser (IBM),
Marius Pirvu (IBM),
Vijay Sundaresan (IBM),
Daryl Maier (IBM),
Thatta Premnath (IBM),
Norman Böwing (IBM),
Sathish Gopalakrishnan (University of British Columbia),
Mohammad Shahrad (University of British Columbia)
|
| Matrix-IC: Harnessing Matrix Outer-product for High-Performance Particle-in-Cell Simulations |
Yizhuo Rao (Sun Yat-sen University),
Xingjian Cui (Sun Yat-sen University),
Jiabin Xie (Sun Yat-sen University),
Shangzhi Pang (Sun Yat-sen University),
Guangnan Feng (Sun Yat-sen University),
Jinhui Wei (Sun Yat-sen University),
Zhiguang Chen (Sun Yat-sen University),
Yutong Lu (Sun Yat-sen University)
|
| FlashPS: Efficient Generative Image Editing with Mask-aware Caching and Scheduling |
Xiaoxiao Jiang (Hong Kong University of Science and Technology),
Suyi Li (Hong Kong University of Science and Technology),
Lingyun Yang (Hong Kong University of Science and Technology),
Tianyu Feng (Hong Kong University of Science and Technology),
Zhipeng Di (Alibaba Group),
Weiyi Lu (Alibaba Group),
Guoxuan Zhu (Alibaba Group),
Xiu Lin (Alibaba Group),
Kan Liu (Alibaba Group),
Yinghao Yu (Alibaba Group),
Tao Lan (Alibaba Group),
Guodong Yang (Alibaba Group),
Lin Qu (Alibaba Group),
Liping Zhang (Alibaba Group),
Wei Wang (Hong Kong University of Science and Technology)
|
| Neuro-C: Neural Inference Shaped by Hardware Limits |
Diletta Romano (Uppsala University, Sweden; RISE, Sweden),
Luca Mottola (Politecnico di Milano, Italy; RISE, Sweden; Uppsala University, Sweden),
Thiemo Voigt (Uppsala University, Sweden; RISE, Sweden)
|
| Practical and Scalable RDMA Connection Sharing for HPC Workload |
Yuejie Wang (Peking University),
Tuo Fang (Huawei Technologies Co., Ltd.),
Biyu Peng (Huawei Technologies Co., Ltd.),
Yang Cheng (Huawei Technologies Co., Ltd.),
Xin Sun (Huawei Technologies Co., Ltd.),
Chengchao Xu (Huawei Technologies Co., Ltd.),
Yuchen Tang (Huawei Technologies Co., Ltd.),
Yuxin Ren (Huawei Technologies Co., Ltd.),
Ning Jia (Huawei Technologies Co., Ltd.),
Xinwei Hu (Huawei Technologies Co., Ltd.),
Yunfei Du (Huawei Technologies Co., Ltd.),
Guyue (Grace) Liu (Peking University)
|
| Scaling LLM Test-Time Compute with Mobile NPU on Smartphones |
Zixu Hao (Tsinghua University), Jianyu Wei (University of Science and Technology of China),
Tuowei Wang (Tsinghua University), Minxing Huang (Tsinghua University),
Huiqiang Jiang (Microsoft Research), Shiqi Jiang (Microsoft Research),
Ting Cao (Institute for AI Industry Research (AIR), Tsinghua University),
Ju Ren (Tsinghua University)
|
| Garen: Reliable Cluster Management with Atomic State Reconciliation |
Mingi Kim (FriendliAI), Ahnjae Shin (FriendliAI), Jaewoo Maeng (Samsung Advanced Institute of Technology),
Myeongjae Jeon (POSTECH), Byung-Gon Chun (FriendliAI, Seoul National University)
|
| ASIC-based Compression Accelerators for Storage Systems: Design, Placement, and Profiling Insights |
Tao Lu (DapuStor Corporation), Jiapin Wang (DapuStor Corporation), Yelin Shan (DapuStor Corporation),
Xiangping Zhang (DapuStor Corporation), Xiang Chen (DapuStor Corporation)
|
| ECCB: Boosting Block Propagation of Blockchain with Erasure-Coded Compact Block |
Bingyi Cai (Huazhong University of Science and Technology), Shenggang Wan (Huazhong University of Science and Technology),
Hong Jiang (Department of Computer Science and Engineering, The University of Texas at Arlington)
|
| Not A DPU in Name Only! Unleashing RDMA-capable DPUs in Multi-Tenant Serverless Clouds with NADINO |
Shixiong Qi (University of Kentucky), Songyu Zhang (University of California, Riverside),
K. K. Ramakrishnan (University of California, Riverside), Diman Zad Tootaghaj (Hewlett Packard Enterprise Labs),
Hardik Soni (Hewlett Packard Labs), Puneet Sharma (Hewlett Packard Labs)
|
| Fix: externalizing network I/O in serverless computing |
Yuhan Deng (Stanford University), Akshay Srivatsan (Stanford University), Sebastian Ingino (Stanford University),
Francis Chua (Stanford University), Yasmine Mitchell (Stanford University), Matthew Vilaysack (Stanford University),
Keith Winstein (Stanford University)
|
| ColdCode: Cold Data Encoding for Enhanced Reliability and Lifetime in 3D NAND Flash |
Qiao Li (Mohamed bin Zayed University of Artificial Intelligence), Shangyu Wu (Mohamed bin Zayed University of Artificial Intelligence),
Zheng Wan (Xiamen University), Yufei Cui (McGill University), Jie Zhang (Peking University),
Chun Jason Xue (Mohamed bin Zayed University of Artificial Intelligence)
|
| Efficient Multimodal Serving via Module Multiplexing |
Zicong Hong (Hong Kong University of Science and Technology), Yuyan Chen (Sun Yat-sen University),
Haoyue Zhang (Hong Kong University of Science and Technology), Peng Li (Xi'an Jiaotong University),
Wuhui Chen (Sun Yat-sen University), Song Guo (Hong Kong University of Science and Technology),
Xiaowei Shen (MetaX)
|
| CHARM: Chiplet Heterogeneity-Aware Runtime Mapping System |
Alessandro Fogli (Imperial College London), Bo Zhao (Aalto University), Peter Pietzuch (Imperial College London),
Jana Giceva (TU Munich)
|
| 2DIO: Configurable and Cache-Accurate Trace Generation for Storage Benchmarking |
Yirong Wang (Northeastern University), Isaac Khor (Northeastern University), Peter Desnoyers (Northeastern University)
|
| Automated End-to-End Model Serving with Cooperative Compilation and Scheduling |
Yikang Zhang (Nanjing University), Junlong Chen (Nanjing University), Wei Wang (Nanjing University),
Jia Liu (Nanjing University), Nan Hu (Hunan University), Haipeng Dai (Nanjing University)
|
| PiLLM: Resource-Efficient LLM Inference Using Workload Prediction |
Yunqian Fan (ShanghaiTech University), Shihao Bai (SenseTime), Ruihao Gong (Beihang University),
Zaijun Wang (SenseTime), Rui Fan (ShanghaiTech University)
|
| MegaScale-Data: Scaling DataLoader for Multi-Source Large Foundation Model Training |
Juntao Zhao (The University of Hong Kong), Qi Lu (ByteDance Inc.),
Wei Jia (ByteDance Inc.), Borui Wan (The University of Hong Kong),
Lei Zuo (ByteDance Inc.), Junda Feng (ByteDance Inc.),
Jianyu Jiang (ByteDance Inc.), Yangrui Chen (ByteDance Inc.),
Shuaishuai Cao (ByteDance Inc.), Jialing He (ByteDance Inc.),
Kaihua Jiang (ByteDance Inc.), Yuanzhe Hu (ByteDance Inc.),
Shibiao Nong (ByteDance Inc.), Yanghua Peng (ByteDance),
Haibin Lin (ByteDance Inc.), Chuan Wu (The University of Hong Kong)
|
| CofferOS: Hardening OS-level Virtualization with Rust |
Minkyu Jung (KAIST), Chanshin Kwak (School of Computing, KAIST),
Junho Ahn (KAIST), Sunho Park (KAIST),
Changjun Lee (KAIST), Jongyul Kim (University of Illinois Urbana-Champaign),
Jeehoon Kang (FuriosaAI), Youngjin Kwon (KAIST)
|
| High Throughput and Low Latency LLM Serving via Adaptive KV Caching |
Wenyan Chen (University of Macau; Shenzhen Institute of Advanced Technology, CAS),
Chengzhi Lu (Nanyang Technological University),
Huanle Xu (University of Macau),
Kejiang Ye (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences),
ChengZhong Xu (University of Macau)
|
| Arena: Efficiently Training Large Models via Dynamic Scheduling and Adaptive Parallelism Co-Design |
Chunyu Xue (Shanghai Jiao Tong University), Weihao Cui (Shanghai Jiao Tong University),
Quan Chen (Shanghai Jiao Tong University), Chen Chen (Shanghai Jiao Tong University),
Han Zhao (Shanghai Jiao Tong University), Shulai Zhang (Shanghai Jiao Tong University),
Linmei Wang (Lenovo Research), Yan Li (Microsoft),
Limin Xiao (Lenovo Research), WeiFeng Zhang (Lenovo Research),
Jing Yang (Guizhou University), Bingsheng He (National University of Singapore),
Minyi Guo (Shanghai Jiao Tong University)
|
| On-device Semantic Selection Made Low Latency and Memory Efficient with Monolithic Forwarding |
Jiahao Zhou (Shanghai Jiao Tong University),
Chengliang Lin (Shanghai Jiao Tong University),
Dingji Li (Huawei),
Mingkai Dong (Shanghai Jiao Tong University),
Haibo Chen (Shanghai Jiao Tong University)
|
| Accelerating Metadata Updates in Distributed File Systems with Efficient Orderings |
Hao Guo (Tsinghua University),
Jiwu Shu (Tsinghua University),
Youyou Lu (Tsinghua University)
|
| Practical and Efficient x86-64 Emulation on RISC-V |
Xiongchuan Tan (Tsinghua Shenzhen International Graduate School, Tsinghua University),
Yang Liu (Institute of Software, Chinese Academy of Sciences),
Sebastien Chevalier (),
Yangyu Chen (College of Computer Science, Chongqing University),
Xiaoyi Liu (Department of Computer Science, Tsinghua University),
Haohuan Fu (Tsinghua University)
|
| RaidenSwap: A Multi-Swap Remote System for Multi-core Applications |
Kefan Liu (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences),
Ke Liu (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences),
Xu Zhang (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences),
Hui Yuan (Huawei), Xiaolong Zheng (Huawei), Ning Liu (Huawei),
Sa Wang (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences),
Guanghui Zhang (Shandong University),
Yungang Bao (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences),
Mingyu Chen (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences),
Chenxi Wang (SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences)
|
| From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents |
Yuan Wang (Key Laboratory of System Software (Chinese Academy of Sciences); Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences),
Mingyu Li (Key Laboratory of System Software (Chinese Academy of Sciences); Institute of Software, Chinese Academy of Sciences),
Haibo Chen (Key Laboratory of System Software (Chinese Academy of Sciences); Institute of Software, Chinese Academy of Sciences; Shanghai Jiao Tong University)
|
| Proteus: Heterogeneous FPGA Virtualization |
Felix Gust (Technical University of Munich),
Shu Anzai (University of California, Los Angeles),
Charalampos Mainas (Technical University of Munich),
Atsushi Koshiba (Tokyo University of Science),
Pramod Bhatotia (Technical University of Munich)
|
| Mitigating CDN Cache Misses with Scheduling: An Origin Shield for Billion-QPS Social Platforms |
Zixuan Yang (State Key Laboratory for Novel Software Technology, Nanjing University),
Yimeng Xu (State Key Laboratory for Novel Software Technology, Nanjing University),
Jiaqi Zheng (State Key Laboratory for Novel Software Technology, Nanjing University),
Boxi Liu (School of Computer Science, Central China Normal University),
Guihai Chen (State Key Laboratory for Novel Software Technology, Nanjing University),
Quan Xia (Tencent), He Lin (Tencent), Zhihai Huang (Tencent), Shangce Yuan (Tencent)
|
| Bridging the GPU Utilization Gap: Predictive Multi-Dimensional Resource Scheduling for AI Workloads |
Yilei Lu (Tsinghua University), Dongbiao He (Southeast University),
Teng Ma (Alibaba Group), Zhe Liu (Baihai),
Letian Ruan (Shanghai Jiao Tong University), Jinlei Jiang (Tsinghua University),
Yongwei Wu (Tsinghua University)
|
| Laminar: A Scalable Asynchronous RL Post-Training Framework |
Guangming Sheng (The University of Hong Kong), Yuxuan Tong (ByteDance),
Borui Wan (The University of Hong Kong), Wang Zhang (ByteDance),
Chaobo Jia (The Chinese University of Hong Kong), Xibin Wu (Bytedance),
Yuqi Wu (Shanghai Jiaotong University), Xiang Li (Bytedance),
Chi Zhang (ByteDance), Yanghua Peng (ByteDance),
Haibin Lin (ByteDance Inc.), Xin Liu (ByteDance Inc.),
Chuan Wu (The University of Hong Kong)
|
| PARD: Enhancing Goodput for Inference Pipeline via Proactive Request Dropping |
Zhixin Zhao (Tianjin University), Yitao Hu (Tianjin University),
Simin Chen (University of Texas at Dallas), Mingfang Ji (Tianjin University),
Wei Yang (University of Texas at Dallas), Yuhao Zhang (Tianjin University),
Laiping Zhao (Tianjin University), Wenxin Li (Tianjin University),
Xiulong Liu (Tianjin University), Wenyu Qu (Tianjin University),
Hao Wang (Stevens Institue of Technology)
|
| Fuzzing Enterprise-Grade Blockchain Systems: Industrial Practice and Solutions |
Fuchen Ma (Tsinghua University), Yuanliang Chen (Tsinghua University),
Zhen Yan (Tsinghua University), Yuanhang Zhou (Tsinghua University),
Yu Jiang (Tsinghua University), Mingchao Wan (Beijing Academy of Blockchain and Edge Computing)
|
| Digital Hole: Bypassing Commercial Audio DRM Solutions with DReaMcatcher |
Björn Ruytenberg (Vrije Universiteit Amsterdam),
Mohammad Sina Karvandi (Vrije Universiteit Amsterdam),
Herbert Bos (Vrije Universiteit Amsterdam),
Erik van der Kouwe (Vrije Universiteit Amsterdam),
Asia Slowinska (Vrije Universiteit Amsterdam)
|
| A Case for Elastic Quantum Error Correction Decoders |
Satvik Maurya (University of Wisconsin-Madison),
Abtin Molavi (University of Wisconsin-Madison),
Aws Albarghouthi (University of Wisconsin-Madison),
Swamit Tannu (University of Wisconsin-Madison)
|
| EMVOD: Elastic Multi-Path QUIC Scheduling for CDN Video-on-Demand Service |
ZiQi Wei (Tsinghua Shenzhen International Graduate School),
Qing Li (Peng Cheng Laboratory),
TianYun Zhao (Tsinghua Shenzhen International Graduate School),
Cheng Luo (Tencent), ChangKui OuYang (Tencent), XiaoFei Yu (Tencent),
DaYi Zhao (Peng Cheng Laboratory),
Yong Jiang (Tsinghua Shenzhen International Graduate School)
|
| TrustWeave: Integrity Measurement and Attestation For Multi-Cloud LLMs |
Jianchang Su (University of Connecticut),
Wenhui Zhang (Independent Researcher),
Yifan Zhang (University of Connecticut),
Kexin Chu (University of Connecticut),
Hao Guo (Tsinghua University),
Youyou Lu (Tsinghua University),
Wei Zhang (University of Connecticut)
|
| MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production |
Chunyu Xue (Shanghai Jiao Tong University),
Yangrui Chen (ByteDance Seed), Jianyu Jiang (ByteDance Seed),
Ningxin Zheng (ByteDance Seed), Junda Feng (ByteDance Seed),
Jingji Chen (ByteDance Seed), Shixiong Zhao (ByteDance Seed),
Shen Yan (Bytedance seed), Yi Lin (ByteDance Seed),
Lei Shi (Bytedance Seed), Zanbo Wang (ByteDance Seed),
Lishu Luo (ByteDance Seed), Faming Wu (ByteDance Seed),
Haibin Lin (ByteDance Seed), Yanghua Peng (ByteDance Seed),
Xin Liu (ByteDance Seed), Quan Chen (Shanghai Jiao Tong University)
|
| Enabling Packet Spraying over Commodity RNICs with In-Network Support |
Xiangzhou Liu (Hong Kong University of Science and Technology),
Wenxue Li (Hong Kong University of Science and Technology),
Zihao Wang (Hong Kong University of Science and Technology),
Kai Chen (Hong Kong University of Science and Technology)
|
| Wayfinder: Automated Operating System Specialization |
Alexander Jung (Lancaster University),
Cezar Crăciunoiu (Politehnica University of Bucharest),
Nikolaos Karaolidis (The University of Manchester),
Hugo Lefeuvre (The University of British Columbia),
Daniel Oñoro Rubio (NEC Laboratories Europe),
Felipe Huici (Unikraft GmbH),
Charalampos Rotsos (Lancaster University),
Pierre Olivier (The University of Manchester)
|
| HetAuto: Cross-Cluster Auto-Parallelism for Heterogeneous Distributed Training |
Guicheng Qi (The University of Hong Kong), Junwei Su (The University of Hong Kong),
Liqi Yang (Meituan Corporation), Tao Li (Meituan Corporation),
Tingwen Xie (Meituan Corporation), Yerui Sun (Meituan Corporation),
Yuchen Xie (Meituan Corporation), Chuan Wu (The University of Hong Kong)
|
| Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation |
Xiangwei Meng (Lanzhou University), Chen Gao (Lanzhou University),
Wei Li (Tsinghua University),
Fengyuan Ren (Lanzhou University / Tsinghua University)
|
| ECOTE: Priority-Aware Optical Restoration for WAN Traffic Engineering |
Yiren Zhao (University of Toronto), Kunling He (Tsinghua University),
Zhiquan Wang (Tsinghua University), Ran Shu (Microsoft Research Asia),
Jilong Wang (Tsinghua University), Congcong Miao (Tencent)
|
| LCMP: Distributed Long-Haul Cost-Aware Multi-Path Routing for Inter-Datacenter RDMA Networks |
Dong-Yang Yu (Beijing University of Posts and Telecommunications),
Yuchao Zhang (Beijing University of Posts and Telecommunications),
Xiaodi Wang (Beijing University of Posts and Telecommunications),
Jun Wang (Beijing University of Posts and Telecommunications),
Wenfei Wu (Peking University),
Haipeng Yao (Beijing University of Posts and Telecommunications),
Wendong Wang (Beijing University of Posts and Telecommunications),
Ke Xu (Tsinghua University)
|
| SmartNS: Enabling Line-rate and Flexible Network Stack with SmartNIC |
Xuzheng Chen (Zhejiang University), Jie Zhang (Zhejiang University),
Baolin Zhu (Zhejiang University), Xueying Zhu (Zhejiang University),
Zhongqing Chen (Alibaba Cloud), Ting Fu (Alibaba Group),
Shu Ma (Alibaba Group), Lingjun Zhu (Alibaba Group),
Chao Shi (Alibaba Group), Yin Zhang (Zhejiang University),
Yuanchao Shu (Zhejiang University), Peng Cheng (Zhejiang University),
Zeke Wang (Zhejiang University)
|
| Reducing the GPU Memory Bottleneck with Lossless Compression for ML |
Aditya Kamath (University of Washington),
Arvind Krishnamurthy (University of Washington/Google),
Marco Canini (KAUST),
Simon Peter (University of Washington)
|
| Once Rolling Hashing is Enough: Exploiting Rolling Hash Reuse in Delta Compression |
Haoliang Tan (Harbin Institute of Technology, Shenzhen),
Wenhao Ou (Harbin Institute of Technology, Shenzhen),
Xiangyu Zou (Harbin Institute of Technology, Shenzhen),
Cai Deng (Harbin Institute of Technology, Shenzhen),
Yanqi Pan (Harbin Institute of Technology, Shenzhen),
Hao Huang (Harbin Institute of Technology, Shenzhen),
Zhaoquan Gu (Harbin Institute of Technology, Shenzhen),
Wen Xia (Harbin Institute of Technology, Shenzhen)
|
| Accurate and Ultra-Fast Launch-Time Validation of Idempotency for GPU Kernels |
Mingcong Han (Shanghai Jiao Tong University),
Weihang Shen (Shanghai Jiao Tong University),
Rong Chen (Shanghai Jiao Tong University),
Haibo Chen (Shanghai JiaoTong University)
|
| Scalable RDMA-accelerated Distributed Locks with Shared Stream Abstraction |
Miao Cai (Nanjing University of Aeronautics and Astronautics),
Junru Shen (Hohai University),
Xiaojian Liao (Beihang University),
Rong Gu (State Key Laboratory for Novel Software Technology, Nanjing University),
Yanchao Zhao (Nanjing University of Aeronautics and Astronautics),
Hao Han (Nanjing University of Aeronautics and Astronautics),
Bing Chen (Nanjing University of Aeronautics and Astronautics),
Baoliu Ye (State Key Laboratory for Novel Software Technology, Nanjing University)
|
| TCO-driven Storage Provisioning for Exascale Data Centers |
Timothy Kim (Carnegie Mellon University),
Saurabh Kadekodi (Google),
Arif Merchant (Google),
Prashant Nema (Microsoft),
Jai Menon (Microsoft),
Rashmi Vinayak (Carnegie Mellon University),
Gregory R. Ganger (Carnegie Mellon University)
|
| No More Translation at Runtime: LLM-Empowered Static Binary Translation |
Zhibo Liu (The Hong Kong University of Science and Technology),
Huaijin Wang (The Hong Kong University of Science and Technology),
Wai Kin Wong (The Hong Kong University of Science and Technology),
Daoyuan Wu (Lingnan University),
Shuai Wang (The Hong Kong University of Science and Technology)
|
| FicusDB: Scalable Multi-Versioned Authenticated Storage |
Hongbo Zhang (Cornell University),
Maofan "Ted" Yin (UC Santa Barbara),
Robbert van Renesse (Cornell University)
|
| AEP: Achieving Hierarchical Fault Tolerance in DSM Through Atomic Execution Protection |
Zixuan Wang (Huazhong University of Science and Technology),
Qi Wu (Huazhong University of Science and Technology),
Hang Huang (Huazhong University of Science and Technology),
Jia Rao (The University of Texas at Arlington),
Hui Lu (The University of Texas at Arlington),
Hao Fan (Huazhong University of Science and Technology),
Zhuo Huang (Huazhong University of Science and Technology),
Song Wu (Huazhong University of Science and Technology),
Hai Jin (Huazhong University of Science and Technology)
|
| HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters |
Antian Liang (Fudan University), Zhigang Zhao (Fudan University),
Kai Zhang (Fudan University), Xuri Shi (Fudan University),
Chuantao Li (Shandong Computer Science Center (National Supercomputer Center in Jinan)),
Chunxiao Wang (Shandong Computer Science Center (National Supercomputer Center in Jinan)),
Zhenying He (Fudan University), Yinan Jing (Fudan University),
X. Sean Wang (Fudan University)
|
| swKokkos: An Athread Backend for Enhanced Kokkos with the Sunway Heterogeneous Architecture |
Junlin Wei (Computer Network Information Center, Chinese Academy of Science; Pengcheng Laboratory; University of Chinese Academy of Sciences),
Jinrong Jiang (Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences),
Wu Wang (Computer Network Information Center, Chinese Academy of Science),
Chen Li (Computer Network Information Center, Chinese Academy of Science),
Yehong Zhang (Pengcheng Laboratory), Yue Yu (Pengcheng Laboratory),
Lian Zhao (Computer Network Information Center, Chinese Academy of Science),
Xiang Han (Computer Network Information Center, Chinese Academy of Science),
Zhenjia Li (Computer Network Information Center, Chinese Academy of Science),
Feng Zhang (Computer Network Information Center, Chinese Academy of Science),
Haoyuan Zhang (Computer Network Information Center, Chinese Academy of Science),
Yidi Bai (Computer Network Information Center, Chinese Academy of Science),
Maoxue Yu (Laoshan Laboratory),
Kaixu (Laoshan Laboratory),
Hailong Liu (Laoshan Laboratory),
Xuebin Chi (Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences)
|
| SwiftFL: Enabling Speculative Training for On-Device Federated Deep Learning |
Yuhui Zhang (State Key Laboratory of Information Security, Institute of Information Engineering, CAS and University of Chinese Academy of Sciences),
Guang Yan (None),
Xin Zhang (Peking University),
Zimu Guo (Institute of Information Engineering, CAS),
Lutan Zhao (Institute of Information Engineering, CAS),
Jiangfeng Cao (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS),
Dan Meng (Institute of Information Engineering, CAS),
Rui Hou (Institute of Information Engineering, CAS)
|
| Crimson: Collaborative Parameter Updates for Efficient Pipeline Training of Large Language Models |
Yapeng Jiang (Sun Yat-sen University), Wuhui Chen (Sun Yat-sen University),
Ganhong Huang (Sun Yat-sen University), Yuzhou Huang (Sun Yat-sen University),
Zicong Hong (Hong Kong University of Science and Technology),
Song Guo (Hong Kong University of Science and Technology),
Yue Yu (Pengcheng Laboratory)
|
| Five Minutes of DDoS Brings down Tor: DDoS Attacks on the Tor Directory Protocol and Mitigations |
Zhongtang Luo (Purdue University),
Jianting Zhang (Purdue University),
Akshat Neerati (Purdue University),
Aniket Kate (Supra Research / Purdue University)
|
| A Logically Disaggregated Cache for Replicated Storage Systems |
Kiran Hombal (University of Illinois Urbana-Champaign),
Henry Zhu (University of Illinois Urbana-Champaign),
Shreesha G. Bhat (University of Illinois Urbana Champaign),
Neil Kaushikkar (Jump Trading Group),
Ramnatthan Alagappan (University of Illinois Urbana-Champaign),
Aishwarya Ganesan (University of Illinois Urbana-Champaign)
|
| Everything You Need to Know About Virtual Machine Live Migration Between Heterogeneous Processors |
Kenta Ishiguro (Université Grenoble Alpes),
Fonyuy-Asheri Caleb (INRIA),
Elouan Barraud (Université Grenoble Alpes),
Renaud Lachaize (Université Grenoble Alpes),
Yérom-David Bromberg (University of Rennes, INRIA),
Alain Tchana (Université Grenoble Alpes)
|
| Million-Scale Text-to-Video Retrieval with Hyperdimensional Computing |
Hyunsei Lee (DGIST), Jaewoo Gwak (DGIST),
Shinhyoung Jang (DGIST), Junyoung Lee (DGIST),
Yeseong Kim (DGIST)
|
| PointShuffler: Accelerating Point Cloud Neural Networks on General-Purpose GPUs |
Yangfan Li (School of Computer Science and Engineering, Central South University),
Zhengjie Jin (School of Computer Science and Engineering, Central South University),
Yue Tian (School of Computer Science and Engineering, Central South University),
Mengquan Li (College of Information Science and Engineering, Hunan University),
Fengxiao Tang (School of Computer Science and Engineering, Central South University),
Ming Zhao (School of Computer Science and Engineering, Central South University),
Cen Chen (School of Future Technology, South China University of Technology)
|
| TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks |
Jianzhu Yao (Princeton University),
Hongxu Su (The Hong Kong University of Science and Technology (Guangzhou)),
Taobo Liao (University of Illinois Urbana-Champaign),
Zerui Cheng (Princeton University),
Huan Zhang (University of Illinois Urbana-Champaign),
Xuechao Wang (The Hong Kong University of Science and Technology (Guangzhou)),
Pramod Viswanath (Princeton University)
|
| Efficient ML Model Updates for Deeply Embedded Microcontrollers |
Shishir G. Patil (University of California, Berkeley),
Sam Kumar (University of California, Los Angeles),
Prabal Dutta (University of California, Berkeley),
Joseph Gonzalez (University of California, Berkeley)
|
| Proactive Change Risk Detection in Production Cloud Systems: ByteDance? Experience |
Jinyang Liu (ByteDance), Yichen Li (ByteDance), Tieying Zhang (ByteDance),
Binbin Chen (ByteDance), Xiao He (ByteDance), Zhihan Jiang (ByteDance),
Haipeng Zhang (ByteDance), Gang Wu (ByteDance), Yi Li (ByteDance)
|
| Avicenna: Masking Slowdowns in Replicated State Machines with Counterfactual Evaluation |
Christopher Hodsdon (Databricks),
Zijian Qin (Princeton University),
Khiem Ngo (Datadog),
Siddhartha Sen (Microsoft Research),
Ethan Katz-Bassett (Columbia University),
Wyatt Lloyd (Princeton University)
|
| Rook: Yield Not Thy Core |
Achilles Benetopoulos (University of California, Santa Cruz),
Peter Alvaro (U. C. Santa Cruz),
Andi Quinn (UC Santa Cruz),
Robert Soule (Yale University)
|
| Prediction-Informed Power Management for General-Purpose Compute Servers |
Jonggyu Park (University of Washington),
Simon Peter (University of Washington),
Thomas Anderson (University of Washington)
|
| NutCracker: A Compilation Framework for Hybrid DPU Architectures |
Yihan Yang (National University of Singapore),
Haifeng Sun (National University of Singapore),
Antoine Kaufmann (Max Planck Institute for Software Systems (MPI-SWS)),
Jialin Li (National University of Singapore)
|
| viNPU: Optimizing Vision Transformer Inference on Mobile NPUs |
Jeho Lee (Yonsei University), Gunjoong Kim (Yonsei University),
Chanyoung Jung (Yonsei University), Jaehee Kim (Yonsei University),
Seonghoon Park (Yonsei University), Hojung Cha (Yonsei University)
|
| RoPeerTo: A Datacenter-Scale Architecture for Peer-To-Peer DMA between GPUs and FPGAs |
Marco Venere (Politecnico di Milano),
Giuseppe Sorrentino (Politecnico di Milano),
Benjamin Ramhorst (ETH Zurich),
Maximilian Jakob Heer (ETH Zurich),
Lucian Petrica (AMD Research and Advanced Development),
Dario Korolija (AMD Research and Advanced Development),
Marco D. Santambrogio (Politecnico di Milano),
Davide Conficconi (Politecnico di Milano),
Gustavo Alonso (ETH Zurich),
Ken O'Brien (AMD Research and Advanced Development)
|
| Accelerating Transactional Execution via Processing-In-Memory |
André Lopes (INESC-ID, IST, Universidade de Lisboa,),
Daniel Castro (INESC-ID, IST, Universidade de Lisboa,),
Paolo Romano (INESC-ID, IST, Universidade de Lisboa,)
|
| AIMS: A Cost-Efficient Framework for LLM-based Agent Deployment in Cloud-Edge Hybrid Environments |
Shiyi Liu (University of Virginia),
Haiying Shen (University of Virginia),
Shuai Che (Microsoft),
Mahdi Ghandi (Microsoft),
Mingqin Li (Microsoft)
|
| Suika: Efficient and High-quality Re-scheduling of 3D-parallelized LLM Training Jobs in Shared Clusters |
Yuxuan Wang (Shanghai Jiao Tong University & Institute of Artificial Intelligence (TeleAI), China Telecom),
Yanbo Wang (Shanghai Jiao Tong University & Institute of Artificial Intelligence (TeleAI), China Telecom),
Chen Chen (Shanghai Jiao Tong University),
Chunyu Xue (Shanghai Jiao Tong University),
Qizhen Weng (Institute of Artificial Intelligence (TeleAI), China Telecom),
Yin Chen (Institute of Artificial Intelligence (TeleAI), China Telecom),
Zeren Li (Huawei Technologies Co., Ltd.),
Xuqi Zhu (Huawei Technologies Co., Ltd.),
Yongqiang Yang (Huawei Technologies Co., Ltd.),
Quan Chen (Shanghai Jiao Tong University),
Minyi Guo (Shanghai Jiao Tong University)
|
| LifeFuzz: Lifecycle-Guided Fuzzing for Windows Driver Cross-Handler Vulnerabilities |
Chendong Yu (Institute of Information Engineering, Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences),
Yuekang Li (University of New South Wales),
Yang Xiao (Chinese Academy of Sciences),
Jie Lu (Institute of Computing Technology, Chinese Academy of Sciences),
Yeting Li (Institute of Information Engineering, Chinese Academy of Sciences;University of Chinese Academy of Sciences),
Defang Bo (Institute of Information Engineering, Chinese Academy of Sciences),
Wei Huo (Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences, China; School of CyberSpace Security at University of Chinese Academy of Sciences, China)
|
| Rose: Reproducing External-Fault-Induced Failures in Distributed Systems with Lighweight Instrumentation |
Sebastião Amaro (INESC-ID),
Miguel Matos (IST Lisbon & INESC-ID),
Pedro Fonseca (Purdue University)
|
| Lessons Learned from Incorporating Formal Methods in Huawei Cloud Reliability |
Claudia Cauli (Huawei Technologies Co., Ltd, Ireland),
Timo Lang (Huawei Technologies Co., Ltd, Ireland),
Shuo Chen (Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China),
Sebti Mouelhi (Huawei Technologies Co., Ltd, Ireland),
Xin Jin (Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China),
Subhajit Bandopadhyay (Huawei Technologies Co., Ltd, Ireland),
Xusheng Chen (Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China),
Yazhi Feng (Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China),
Haoze Song (Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China),
Linhua Tang (Huawei Technologies Co., Ltd, Ireland),
Zhenli Sheng (Huawei Technologies Co., Ltd., Shenzhen, Guangdong, China),
Ananth Shrinivas Srinath (Huawei Technologies Co., Ltd, Ireland)
|