Back to AI Lab

Tools

Research papers, repositories, and articles about tools

Showing 15 of 15 items

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.

Seokju Cho, Ryo Hachiuma

DeusData/codebase-memory-mcp

High-performance code intelligence server that indexes repos into a persistent knowledge graph. Lets AI tools query big codebases with fewer tokens and sub-millisecond latency. If you build code agents, this is production-grade infrastructure to borrow. ([github.com](https://github.com/trending?since=daily))

19,582

NVIDIA/SkillSpector

SkillSpector scans AI agent skills for security problems, prompt injection vectors, and risky patterns. It aims to become the "lint" tool for agent skill stores as organizations start sharing and reusing skills at scale.

5,163

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

CAR-bench builds an in-car assistant world with messy, ambiguous user requests and many tools. It measures not just if agents finish tasks, but whether they know when they’re out of their depth.

Johannes Kirmayr, Lukas Stappen

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

ToolSafe builds a guardrail model that watches each tool call an agent plans to make and flags dangerous ones before they run. In tool-using agents under prompt-injection attacks, it slashes harmful calls while slightly improving task success.

Yutao Mou, Zhangchi Xue

usestrix/strix

An "AI hacker" that scans your app for security issues using agents. Shows where security and AI Ops are colliding. Worth studying if you're adding automated testing around LLM apps. ([github.com](https://github.com/trending?since=daily))

26,701

microsoft/markitdown

markitdown converts many document formats into clean Markdown, which LLMs handle more reliably. It’s a practical bridge between messy office files and the text-first world most AI models live in.

153,273

MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing

Presents a framework that treats an agent swarm as a graph you can design, visualize, and debug. Makes multi-agent systems feel more like building workflows than wiring hacks.

Yang Liu, Jinxuan Cai

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

MatchTIR stops treating every step in a tool-using trajectory equally. It uses bipartite matching to match predicted tool traces to gold traces, then assigns rewards per step, making small models competitive with larger ones on long tool-use tasks.

Changle Qu, Sunhao Dai

browser-use/video-use

Lets coding agents edit videos programmatically. Bridges dev tooling and media pipelines. If you're eyeing agentic video editing or auto-content workflows, this is a strong starting point. ([github.com](https://github.com/trending?since=daily))

11,017

AdaTooler-V: Adaptive Tool-Use for Images and Videos

AdaTooler-V teaches vision-language models when to call external tools, not just how. That cuts unnecessary tool calls, reducing costs while often boosting accuracy on vision tasks.

Chaoyang Wang, Kaituo Feng

swisskyrepo/PayloadsAllTheThings

A giant, curated list of exploit payloads and bypass tricks for web security and CTFs. It’s becoming the default knowledge base security-focused AI tools plug into. ([github.com](https://github.com/trending))

73,068

nesquena/hermes-webui

A web and mobile-friendly UI for the Hermes agent system. It lets non-engineers spin up task-running agents with a few clicks. If you’re experimenting with agents for teams, this is a fast way to test workflows without writing your own front-end.

9,938

dmtrKovalenko/fff.nvim

Fast file-search and context toolkit designed for AI agents and Neovim. Helps agents find the right code spans instead of grepping everything.

3,687

Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task

The authors augment multimodal LLMs with a "Video Toolkit" and a STAR (Spatiotemporal Reasoning) framework that orchestrates calls to temporal and spatial tools for video question answering. Instead of treating the video as a black-box embedding, the model actively localizes key regions over time using tools, yielding sizable gains on VideoMME and LongVideoBench when wrapped around GPT-4o.

Sunqi Fan, Jiashuo Cui