Back to AI Lab

Optimization

Research papers, repositories, and articles about optimization

Showing 13 of 13 items

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Argues today’s popular 4‑bit number format systematically underestimates values and destabilizes large-model training. Proposes a uniform 4‑bit recipe that stays closer to BF16 while saving memory and compute.

Qian Zhao, Kunlong Chen

Stronger Normalization-Free Transformers

Introduces Derf, a simple point-wise activation that replaces normalization layers like LayerNorm and RMSNorm while improving generalization across vision, speech, DNA sequence modeling, and GPT-style language models. The authors systematically study properties of point-wise functions, run a large-scale search, and show Derf outperforms prior normalization-free approaches (e.g., Dynamic Tanh) with similar or better stability. ([arxiv.org](https://arxiv.org/abs/2512.10938))

Mingzhi Chen, Taiming Lu

Accelerating Gemini Nano Models on Pixel with Frozen Multi-Token Prediction

Google shows how "frozen" multi-token prediction lets small on-device Gemini Nano models generate several tokens per step while staying accurate. This shrinks latency and power use on phones. If you care about edge deployment, the design details here are directly reusable. ([research.google](https://research.google/blog/?utm_source=openai))

Google Research Blog

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

This paper shows why some learning-from-verifiable-feedback methods push models toward bloated answers. The authors fix the loss so you can improve reasoning without secretly optimizing for longer outputs.

Fanfan Liu, Youyang Yin

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

MaxCode treats code optimization as a reinforcement learning search over code edits guided by runtime feedback. It uses natural-language critiques and a reward model to steer generation, beating past systems at speeding up CUDA and C++ kernels.

Jiefu Ou, Sapana Chaudhary

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

Targets RL for diffusion LLMs by introducing d-TreeRPO, which uses tree-structured rollouts and bottom-up advantage computation with verifiable outcome rewards for fine-grained credit assignment. The method also adds a time-scheduled self-distillation loss to improve probability estimates, yielding large gains on Sudoku, Countdown, GSM8K, and Math500 over existing RL baselines. ([arxiv.org](https://arxiv.org/abs/2512.09675?utm_source=openai))

Leyi Pan, Shuchang Tao

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

Diamond Maps reframe reward alignment as learning a transport map over model outputs instead of tweaking rewards token by token. This gives smoother, more sample-efficient updates and shows strong results across safety-style alignment tasks.

Peter Holderrieth, Douglas Chen

Deep Delta Learning

The authors replace standard residual skip connections with a learnable "Delta" operator that can flexibly distort the identity path. This lets deep nets control how much old information to erase versus new information to write, improving how they model complex dynamics while keeping training stable.

Yifan Zhang, Yifeng Liu

Kernel Foundry: A Diagnosis-Driven Evolutionary Kernel Optimizer with Multi-Experts

Kernel Foundry evolves GPU kernels using feedback from correctness checks and performance diagnostics instead of blind search. It reaches 100% correctness on a benchmark and beats hand-tuned baselines. If you’re fighting GPU bottlenecks, this hints that AI-guided kernel search is starting to work in practice.

Zixuan Huang, Da Chen

666ghj/MiroFish

Implements a "swarm intelligence" engine that predicts arbitrary signals. Useful playground if you want to experiment with alternative forecasting and ensemble ideas.

7,171

Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection

Hi‑ZFO mixes gradient-based updates on important layers with gradient-free noise on the rest to escape bad minima. It aims to get better customized models with less compute and more stable training than pure gradient methods.

Feihu Jin, Ying Tan

Optimization is Not Enough: Why Problem Formulation Deserves Equal Attention

The authors argue that many "AI optimization" wins really come from how humans pose the problem, not from the math alone. They show cases where small tweaks in formulation beat heavy algorithmic tuning, especially in engineering-style tasks.

Iván Olarte Rodríguez, Gokhan Serhat

Gradient-Free Training of Spiking Neural Networks via Low-Rank Evolution Strategies

The authors train spiking neural networks without backprop, using a low-rank evolution strategy that scales to larger models. They match or beat surrogate-gradient baselines on benchmarks. If you care about neuromorphic hardware or energy-efficient AI, this is a cleaner path than shoehorning backprop into spikes.

Dhruv Patankar, Sachit Ramesha Gowda