Russia has one of the largest open data sets in the field of digital content. VK provided researchers with access to an array of information about user interaction with short videos. The VK-LSVD (Large Short-Video Dataset) dataset includes more than 40 billion impersonal interactions, covering the behavior of 10 million people and data on 20 million videos over a six-month period.
The set contains detailed information about how the audience reacts to short videos: likes, dislikes, reposts, viewing time, and playback context are taken into account. In addition, developers have gained access to information about the socio-demographic characteristics of users, which significantly improves the accuracy of analysis and development of algorithms for personalized recommendations.
Short videos have a special feature — they are not perceived in the background, each piece of content receives a different audience reaction. This makes such arrays especially valuable for specialists in the field of artificial intelligence, since the data allows you to model not only user preferences, but also patterns of content consumption.
The publication of such open data sets is an important event for the scientific and technological community. It provides an opportunity to improve recommendation systems, develop new approaches in the field of machine learning, and test innovative models for analyzing behavioral data.