Octopus: Enhancing CXL Memory Pods via Sparse Topology (arXiv)

Yuhong Zhong; Fiodar Kazhamiaka; Pantea Zardoshti; Shuwei Teng; Rodrigo Fonseca; Mark D. Hill; Daniel S. Berger

Octopus: Enhancing CXL Memory Pods via Sparse Topology (arXiv)

Yuhong Zhong ,
Fiodar Kazhamiaka ,
Pantea Zardoshti ,
Shuwei Teng ,
Rodrigo Fonseca ,
Mark D. Hill ,
Daniel S. Berger

January 2025

arXiv

Download BibTex

The Compute Express Link (CXL) interconnect enables compute”pods”that pool memory across servers to reduce cost and improve efficiency. These pods also facilitate pairwise communication whose needs conflict with pooling. Importantly, existing pod designs are small or require indirection through expensive switches. These conventional designs implicitly assume that pods must fully connect all servers to all CXL pooling devices. This paper breaks with this conventional wisdom by introducing Octopus pods. Octopus directly connects servers to low-port-count CXL pooling devices (e.g., 4 ports) yet scales to large pods without switches by constructing a sparse CXL topology in which each pooling device connects to a carefully chosen subset of servers. Octopus explicitly balances”overlap”, where two servers connect to the same pooling device: overlap reduces pooling efficiency but enables low-latency communication. Octopus resolves this tension by grouping servers into”islands”with low-latency intra-island communication and interconnecting islands to favor pooling. We build a three-server CXL pod prototype and simulate scaled pods with 96 servers under measured device characteristics and physical constraints (1.5 m copper cables). On hardware, Octopus RPCs are 3.2x faster than in-rack RDMA and 2.4x faster than CXL switches. In simulation, Octopus achieves net server cost savings of 3-5.4% whereas CXL switches result in a net cost increase.