nano-JEPA: A Proposal to Enable the Video Understanding Using Personal Computers
Keywords:
Feature Prediction, Unsupervised Learning, Visual Representations, Video, JEPAAbstract
V-JEPA is an artificial intelligence model whose objective is to understand and predict video content. Uses a self-supervised learning approach; It is pretrained on unlabeled data and then tailored to specific tasks. It learns by predicting missing or masked parts of a video, forcing the model to understand and develop a comprehensive view of the scene. It aims to develop artificial intelligence that learns in a similar way to humans, forming internal models of the world around them to adapt and complete tasks efficiently. However, their enormous computational demands, which often require powerful GPU clusters, limit accessibility for many researchers. Therefore, nano-JEPA, an adaptation of V-JEPA, is proposed to run on personal computers, even without GPU. The nano-dataset repository is also presented, which facilitates the creation of manageable subsets from large public video data sets. The goal is to enable greater participation and experimentation inresearch with models similar to V-JEPA. Reasonable performance of nano-JEPA could be observed in subsequent tasks, opening doors for further exploration and innovation.
Downloads
Metrics
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Adrián Rostagno, Javier Iparraguirre, Joel Ermantraut, Guillermo R. Friedrich

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



