blog
August 25, 2025
Wilmington

At ADI, AI means developing world-class models that overcome latency, power, and cost constraints to deliver insights. It’s almost the end of August, and while we try to make the most of the final days of summer vacation, you can be sure that we are helping you make the most of that time. Here are the papers members of our team are recommending for August about all things physical intelligence:
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention 🔗 https://bit.ly/bc-aug-1 👍🏼 Recommended by: Debanjan Ghosh
Why it’s worth reading: In this paper from Deepsek, AI, the authors introduce NSA (Native Sparse Attention) - a novel sparseattention mechanism that's both natively trainable and hardware-aligned. This architecture enables efficient longcontext modeling without compromising performance. It uses a dynamic hierarchical strategy—merging coarse-grained token compression with fine-grained selection—to preserve global context and local precision, and backs it with optimized kernels for modern accelerators. NSA matches or exceeds full-attention models across benchmarks and reasoning tasks, while delivering substantial speedups on 64Ktoken sequences during training and inference.
How Attention Sinks Keep Language Models Stable 🔗 https://bit.ly/bc-aug-2 👍🏼 Recommended by: Yakov Shkolnikov
Why it’s worth reading: This paper from MIT’s HanLab tackles the inefficiency of handling long, continuous inputs in large language models by introducing an attention sink mechanism that stabilizes attention without recomputing or discarding past states. Unlike traditional KV-cache methods that scale poorly with streaming inputs, the author’s solution—StreamingLLM— keeps inference both stateless and efficient, enabling real-time, low-latency performance. The approach achieves up to 22× speedups and supports inputs exceeding 4 million tokens, making it a compelling direction for applications in speech, dialogue, and other streaming AI systems.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces 🔗 https://bit.ly/bc-aug-3 👍🏼 Recommended by: Solomon Garber
Why it’s worth reading: This paper from CMU and Princeton researchers, Mamba introduces a novel selective state-space model (SSM) architecture that substitutes the compute-heavy Transformer with an input-adaptive, attention-free backbone. By parameterizing SSMs based on current tokens and employing a hardware-aware, parallel recurrent algorithm, Mamba achieves true linear scaling and 5× faster inference than Transformers. Impressively, the Mamba3B variant outperforms same-size Transformers and rivals models twice its size across language, audio, and genomics benchmarks—even at million-token sequence lengths
aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio 🔗 https://bit.ly/bc-aug-4 👍🏼 Recommended by: Shanza Iftikar
Why it’s worth reading: aTENNuate offers a compelling advance in realtime speech enhancement by introducing a streamlined deep statespace autoencoder that processes raw audio waveforms endtoend—no spectrograms, no complex preprocessing. It achieves state-of-the-art results on both VoiceBank + DEMAND and Microsoft DNS1, surpassing prior realtime denoising models in PESQ score, parameter efficiency, MACs, and low latency, while retaining fidelity even under extreme compression (4 kHz, 4 bits)—making it especially promising for low-resource, edgedevice applications.
Unsupervised Domain Adaptation of Deep Networks for ToF Depth Refinement 🔗 https://bit.ly/bc-aug-6 👍🏼 Recommended by: Tarun Krishna
Why it’s worth reading: This University of Padua paper tackles a critical realworld problem: enhancing TimeofFlight (ToF) depth maps, which suffer from noise and multipath interference (MPI), without relying on scarce labeled realworld data. The authors introduce three innovative unsupervised domain adaptation strategies—operating at the input, feature, and output levels—that elegantly bridge the gap between synthetic training data and unlabeled real data using adversarial learning and domain translation techniques. The results not only outperform stateoftheart methods but also offer a robust, labelefficient path to realworld ToF depth refinement.
The Hidden Drivers of HRM's Performance on ARC-AGI 🔗 https://bit.ly/bc-aug-5 👍🏼 Recommended by: Yakov Shkolnikov`
Why it’s worth reading: The ARC Prize team’s analysis of the Hierarchical Reasoning Model (HRM) shows that its surprising success on ARC-AGI benchmarks comes not from its hierarchical planner-worker architecture, but from iterative refinement and training dynamics. Their ablations reveal that the hierarchy adds little beyond what a transformer of similar size achieves, while the outer refinement loop, adaptive halting, and task augmentation drive most of the gains. This makes HRM a valuable case study in how training strategies, rather than architectural novelty, can unlock reasoning performance in compact models.