Datasets
Research papers, repositories, and articles about datasets
Showing 4 of 4 items
VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding
VideoKR gives you 315k tough reasoning questions over 145k expert videos. It’s built to push models beyond captioning toward real multi-step explanations. Use it to pressure-test any video model that claims "understanding" rather than just pattern matching.
Action100M: A Large-scale Video Action Dataset
Action100M is a fully-automatic dataset built from over a million how-to videos, giving around 100 million labeled action snippets. It uses V-JEPA features plus a GPT-based pipeline to label segments, and it unlocks clean scaling curves for action recognition models.
The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset
KITScenes offers a rich European driving dataset with high-res cameras, long-range lidar, 4D radar, and dense HD maps. It fixes gaps in current driving sets around sensor quality and map coverage. If you’re training driving or world models, this is a new high-end reference point.
tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation
tasksource standardizes how hundreds of NLP datasets map inputs and labels into a common schema. That makes it much easier to train and test multi-task models without hand-writing fragile preprocessing code for each dataset.