Datasets

Research papers, repositories, and articles about datasets

Showing 4 of 4 items

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

VideoKR gives you 315k tough reasoning questions over 145k expert videos. It’s built to push models beyond captioning toward real multi-step explanations. Use it to pressure-test any video model that claims "understanding" rather than just pattern matching.

Lin Fu, Zheyuan Yang

Action100M: A Large-scale Video Action Dataset

Action100M is a fully-automatic dataset built from over a million how-to videos, giving around 100 million labeled action snippets. It uses V-JEPA features plus a GPT-based pipeline to label segments, and it unlocks clean scaling curves for action recognition models.

Delong Chen, Tejaswi Kasarla

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

KITScenes offers a rich European driving dataset with high-res cameras, long-range lidar, 4D radar, and dense HD maps. It fixes gaps in current driving sets around sensor quality and map coverage. If you’re training driving or world models, this is a new high-end reference point.

Richard Schwarzkopf, Fabian Immel

tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation

tasksource standardizes how hundreds of NLP datasets map inputs and labels into a common schema. That makes it much easier to train and test multi-task models without hand-writing fragile preprocessing code for each dataset.

Damien Sileo