AI Research Lab

Explore the latest AI research papers, trending repositories, and technical articles from leading AI labs and the open source community.

141

ArXiv Papers

GitHub Repos

Dev Blogs

Research Papers

View all

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

This paper is a systematic exploration of reinforcement learning for text-to-3D generation, dissecting reward design, RL algorithms, data scaling, and hierarchical optimization. The authors introduce a new benchmark (MME-3DR), propose Hi-GRPO for global-to-local 3D refinement, and build AR3D-R1—the first RL-tuned text-to-3D model that improves both global shape quality and fine-grained texture alignment.

Yiwen Tang, Zoey Guo +

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

OPV (Outcome-based Process Verifier) is a verifier model that inspects the rationale steps of long chains-of-thought via summarized outcomes, combining the strengths of outcome-based and process-based verification. Trained with an active learning loop, rejection fine-tuning, and RLVR, OPV reaches strong F1 on OPV-Bench and outperforms much larger models like Qwen3-Max-Preview at detecting reasoning errors.

Zijian Wu, Lingkai Kong +

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

This work presents a long-horizon reasoning agent for Olympiad-level math that uses an Outcome-based Process Verifier (OPV) to supervise and clean up very long chains-of-thought. By summarizing and checking reasoning segments rather than only final answers, and training OPV via iterative active learning and RLVR, the system achieves new SOTA on a held-out benchmark while reducing annotation cost.

Songyang Gao, Yuzhe Gu +

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

T-pro 2.0 is an open-weight Russian large language model focused on hybrid reasoning: it can answer directly or emit explicit reasoning traces, and it’s optimized for low-latency inference via speculative decoding. Alongside the model, the authors release a Russian instruction corpus, a math benchmark, and an EAGLE-based inference stack, making it a practical foundation for Russian-language reasoning applications.

Dmitrii Stoianov, Danil Taranets +

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

This report compares seven frontier language and vision models across many safety tests, from basic benchmarks to adversarial red-teaming. It finds GPT-5.2 clearly safest overall while others trade off safety across languages, modalities, and threat models.

Xingjun Ma, Yixu Wang +

STEP3-VL-10B Technical Report

STEP3-VL-10B is a 10B-parameter vision–language model that rivals much larger systems by combining unified pretraining with heavy post-training and parallel coordinated reasoning at run time. Use it as a strong open baseline for high-end multimodal tasks without giant hardware.

Ailin Huang, Chengyuan Yao +

Trending Repositories

View all

stable-diffusion-webui

stable-diffusion-webui by AUTOMATIC1111 is the de facto standard local web interface for Stable Diffusion, providing a massive feature set—txt2img, img2img, inpainting/outpainting, upscaling, LoRA/embeddings support, training utilities, and a huge extension ecosystem—on top of consumer GPUs. If you’re doing any kind of image generation or fine-tuning with Stable Diffusion in a local or lab environment, this is usually the first tool people reach for and the one most community workflows target. ([github.com](https://github.com/AUTOMATIC1111/stable-diffusion-webui?utm_source=openai))

158,945

huggingface/transformers

The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.

156,240

anthropics/claude-code

Claude Code runs as a terminal-native coding agent that understands your repo and executes commands. It blurs the line between shell, IDE, and assistant, and it’s quickly becoming a default tool for power users.

55,236

anomalyco/opencode

OpenCode is an open-source coding agent that edits and writes code for you, wired into modern tooling. Use it as a local, hackable alternative to proprietary AI dev environments.

61,642

obra/superpowers

Superpowers is a skills library and workflow for coding agents like Claude Code and OpenCode. It bakes in design, planning, testing, and review loops so agents behave like disciplined junior engineers.

28,500

exo-explore/exo

Exo turns a pile of Macs or PCs into one AI cluster so you can run huge models at home. It auto-discovers devices, shards models across them, and uses high-speed links like Thunderbolt to get near data-center performance. ([github.com](https://github.com/trending))

35,600

Technical Articles

View all

SynthID Detector: Identify content made with Google's AI tools

Google announces SynthID Detector, a web portal that lets you upload images, audio, video, or text generated with Google AI tools and automatically checks for imperceptible SynthID watermarks, highlighting which parts of the content are likely watermarked. For developers and media teams, it’s a turnkey authenticity check for content produced with models like Gemini, Imagen, Lyria, and Veo, designed to plug into editorial and trust-&-safety workflows. ([blog.google](https://blog.google/technology/ai/google-synthid-ai-content-detector/))

Google AI Blog

Introducing OpenAI for Healthcare

OpenAI packages its models and tools into a healthcare-focused offering with stronger privacy, compliance, and clinical integration. Health systems get a clearer path to use GPT‑class models for charting, triage, and custom workflows without rolling everything themselves.

OpenAI