Nano-datasets: Enabling Efficient Video Understanding Research with Customizable Subsets of Large-Scale Datasets

Authors

  • Joel Ermantraut Universidad Tecnológica Nacional, Facultad Regional Bahía Blanca, Argentina.
  • Lucas Tobio Universidad Tecnológica Nacional, Facultad Regional Bahía Blanca, Argentina.
  • Segundo Foissac Universidad Tecnológica Nacional, Facultad Regional Bahía Blanca, Argentina.
  • Javier Iparraguirre Universidad Tecnológica Nacional, Facultad Regional Bahía Blanca, Argentina.

Keywords:

nano-datasets, self-supervised learning, video representations, computer vision, machine learning

Abstract

The advancement of self-supervised learning in video understanding has been facilitated by large-scale datasets, yet their size poses challenges for researchers with limited computational resources. To address this, we introduce nano-datasets, a repository of scripts designed to generate customizable subsets from established video datasets like Kinetics, Something-Something-v2, and ImageNet-1K. These scripts maintain the semantic integrity and structure of the original datasets while allowing users to create smaller, more manageable versions tailored to their specific research needs. By enabling researchers to experiment with diverse architectures and fine-tune models on accessible datasets, nano-datasets aims to democratize video understanding research and foster reproducibility and collaboration within the field.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Published

2025-07-15

How to Cite

Ermantraut, J., Tobio, L., Foissac, S., & Iparraguirre, J. (2025). Nano-datasets: Enabling Efficient Video Understanding Research with Customizable Subsets of Large-Scale Datasets. AJEA (Proceedings of UTN Academic Conferences and Events), (AJEA 47). Retrieved from https://rtyc.utn.edu.ar/index.php/ajea/article/view/1874

Conference Proceedings Volume

Section

Proceedings - Information and Computer Systems