Back to AI Lab

Code

Research papers, repositories, and articles about code

Showing 10 of 10 items

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code2LoRA turns an entire repository into a lightweight adapter instead of more prompt tokens. It supports static snapshots and an "evolving" mode that tracks commits with a GRU. If you run code models at scale, this is a practical way to cut context while staying up to date.

Liliana Hotsko, Yinxi Li

Partnering with Mozilla to improve Firefox’s security

Anthropic used Claude Opus 4.6 to scan Firefox’s code and surfaced 22 new vulnerabilities, 14 rated high severity. The post lays out a playbook for pairing AI bug hunters with human maintainers safely.

Anthropic Newsroom

openai/codex

A lightweight coding agent that runs directly in your terminal, wiring OpenAI models into a loop that edits files, runs tests, and applies patches. Compared to IDE plugins, it’s closer to a shell-native ‘pair programmer’ that can operate on entire repos and workflows. Given its rapid adoption and tight integration with existing CLIs, it’s poised to become a reference design for terminal-first code agents.

54,000

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code2LoRA reads an entire code repository once and spits out a custom LoRA adapter for a frozen code model. That moves repo knowledge into weights instead of ever-longer prompts, so coding agents stay fast even as the codebase changes. Use this if you’re hitting context limits on big, fast-moving repos.

Liliana Hotsko, Yinxi Li

InCoder-32B-Thinking: Industrial Code World Model for Thinking

Trains a 32B-parameter code model on synthetic “thinking traces” and hardware execution logs. Targets chip design, GPU tuning, and embedded code with explicit reasoning steps.

Jian Yang, Wei Zhang

DeusData/codebase-memory-mcp

High-performance code intelligence server that indexes repos into a persistent knowledge graph. Lets AI tools query big codebases with fewer tokens and sub-millisecond latency. If you build code agents, this is production-grade infrastructure to borrow. ([github.com](https://github.com/trending?since=daily))

19,582

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Introduces NL2Repo-Bench, a benchmark where coding agents must generate or modify entire repositories from natural language specifications, rather than solving single-file LeetCode-style tasks. It evaluates long-horizon planning, tool use, and consistency across files and modules. This is a big step toward evaluating code agents in settings that look like real software projects instead of toy problems.

Jingzhe Ding, Shengda Long

Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale

Meta describes Confucius Code Agent (CCA), an open-source AI "software engineer" built on the Confucius SDK with hierarchical working memory, persistent cross-session notes, and robust tool orchestration. On SWE-Bench-Pro it reaches 54.3% Resolve@1, substantially outperforming prior coding agents while emphasizing transparency and extensibility for industrial-scale workflows. ([huggingface.co](https://huggingface.co/papers/2512.10398))

Zhaodong Wang, Zhenting Qi

Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale

HF pitches Confucius Code Agent as an industrial-strength open coding agent with hierarchical working memory, persistent notes, and a meta-agent that continuously refines configurations. If you care about reproducible, extensible coding agents rather than opaque SaaS tools, this is a substantial systems paper. ([huggingface.co](https://huggingface.co/papers/2512.10398))

Zhaodong Wang, Zhenting Qi

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

The authors break code problems into atomic pieces, then recombine them to generate harder tasks for reinforcement learning with verifiable rewards. This produces richer training data than simple template expansion and boosts code performance across domains. It’s a strong signal that smarter task generation matters as much as bigger models.

Jiasheng Zheng, Boxi Cao