blog
September 23, 2025
Wilmington

At ADI, AI means developing world-class models that overcome latency, power, and cost constraints to deliver insights. It’s September, and with it—we return from vacations to the office, school, and the end-of-year sprint. To get you off the blocks in the best form, here are the papers members of our team are reading about all things physical intelligence:
Trainable Frontend For Robust and Far-Field Keyword Spotting 🔗 https://bit.ly/bc-sep-1 👍🏼 Recommended by: Oguzhan Buyuksolak
Why it’s worth reading: This Google-authored paper introduces per-channel energy normalization, or PCEN—a trainable, causal, per channel energy normalization that replaces log-mel with an AGC-like compressor—built for robust, far-field keyword spotting on tiny devices. PCEN improves tolerance to loudness variation and background noise without having a significant effect on model complexity. Such tolerance is critical for alwayson, lowpower wakeword systems. Because PCEN’s parameters are learned, you can cooptimize the frontend and acoustic model endtoend for your target hardware, enabling realtime, streaming inference with small memory/compute budgets and no future context.
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models 🔗 https://bit.ly/bc-sep-5 👍🏼 Recommended by: Shanza Iftikar
Why it’s worth reading: SmoothQuant, a paper from MIT and NVIDIA, delivers a training-free, hardware-friendly post-training quantization (PTQ) that finally makes full 8-bit weight and activation (W8A8) inference practical for LLMs. It does so by “smoothing” activation outliers via an equivalent per-channel scaling that shifts quantization difficulty from activations to weights. The method preserves accuracy across models, including OPT, BLOOM, GLM, MT-NLG, and LLaMA while using standard INT8 General Matrix Multiplications (GEMMs). In practice, it yields up to 1.56× speedup, 2× memory reduction, and even serves a 530B model on a single 8GPU node—tangible wins for production cost and scale.
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks 🔗 https://bit.ly/bc-sep-2 👍🏼 Recommended by: David Adley
Why it’s worth reading: The paper, authored by researchers from China’s International Digital Economy Academy (IDEA), presents Grounded Segmented Anything Model (SAM). It shows how to assemble existing open-world components—Grounding DINO for text-to-box, SAM for segmentation, and diffusion-based inpainting—into a single, promptable system that executes text-to-mask and text-to-edit without retraining. The combination delivers practical open-vocabulary segmentation and controllable image editing via modular, reproducible pipelines. As practitioners, the appeal is engineering: clean interfaces, composability, and zero-shot utility. This flexibility makes Grounded SAM a strong baseline for rapid dataset bootstrapping, visual agent prototypes, and robotics/AR perception. It also shows how to productize foundation models for multi-modal use cases with minimal bespoke training.
A neural operator-based surrogate solver for free-form electromagnetic inverse design 🔗 https://bit.ly/bc-sep-4 👍🏼 Recommended by: Philip Sharos
Why it’s worth reading: Existing inverse-design pipelines choke on compute: each step demands finite element solvers (FEM) in the frequency domain and finite difference solvers for both frequency (FDFD) and time domain (FDTD). This requires hundreds of iterations, rendering realistic 3D, free-form nanophotonic structures impractical. Semi-analytical shortcuts are faster but only work for narrow geometries and assumptions. And generic ML surrogates trade accuracy for prohibitive amounts of training data, erasing their speed advantage. This paper by researchers from Karlsruhe Institute of Technology in Germany and the University of Tartu in Estonia tackles all three by introducing a data-efficient, modified Fourier Neural Operator surrogate that enables gradient-based inverse design of fully 3D free-form electromagnetic scatterers—bringing rapid, practical design loops to nanophotonics applications.
VGGT: Visual Geometry Grounded Transformer 🔗 https://bit.ly/bc-sep-3 👍🏼 Recommended by: Terry Wu
Why it’s worth reading: VGGT unifies core 3D vision tasks in a single feed-forward transformer: from one to hundreds of views, it directly predicts camera parameters, depth/point maps, and dense 3D tracks, reconstructing scenes in under a second and surpassing optimization-heavy baselines. VGGT’s generalist backbone transfers well—pretraining yields gains in nonrigid point tracking and feedforward novel view synthesis. The work comes from the Visual Geometry Group (University of Oxford) and Meta AI.