Scientists have revealed the abilities of Vision Transformers

Japanese researchers from Osaka University presented the results of a unique experiment that demonstrated that generative models of artificial intelligence Vision Transformers (ViT) can develop visual processing skills similar to human ones. These abilities emerged in the models spontaneously-without explicit instructions or predefined filters, thanks to a specific training method.

As part of the new study, the researchers applied a self-supervised learning technique called DINO (self-distillation with no labels), which allowed models to independently form the mechanisms of perception of visual scenes. Instead of setting AI fixed rules, the scientists allowed the systems to learn visual information in a natural environment by analyzing a vast array of video content.

Lead author of the study, Dr Takuto Yamamoto, explained: "Our models didn't just randomly switch between image elements. They spontaneously developed specialized functions. One group of models learned to consistently focus on faces, another — on the contours of shapes, and the third-on the background. This reflects the same segmentation and scene perception strategy that is typical of the human visual system."

To test the hypothesis, the researchers compared the models ' visual strategies with data obtained from tracking eye movements in people who watched the same video clips. The results were striking: the models trained by the DINO method showed behavior almost identical to that of humans. In contrast, systems that used traditional algorithms with fixed filters showed unnatural and fragmented ways of image perception.

Particular attention was drawn to the fact that none of the models received preliminary instructions on which objects should be considered significant. However, AI independently began to give priority to individuals, which, according to scientists, is associated with their high information content. Senior author of the study Professor Shigeru Kitazawa noted: "This is strong evidence that self-supervised learning can capture something fundamental about the nature of learning intelligent systems — both artificial and biological."

Further analysis confirmed that VIT models trained with DINO not only formed structures similar to human visual perception, but also quantitatively reproduced typical patterns of gaze fixation. This was especially evident in scenes involving humans, where the overlap between human and AI behavior was maximal.

This research raises new questions about the limits of artificial intelligence's ability to understand and interpret the world around us. The results obtained at Osaka University not only bring us closer to creating truly “sighted” machines, but also open the way to a better understanding of the very process of human perception.

Илон Маск запускает Baby Grok — чат-бот нового поколения для детей

Узбекистан формирует Секретариат Совета иностранных инвесторов

Microsoft направит 1,7 миллиарда долларов на покупку навоза, экскрементов и отходов

Утверждён план мероприятий по итогам Совета иностранных инвесторов

Wildberries presents its first image-based communication campaign

Uzbekistan invests up to $100 million in launch of national satellite

Europa: Clear images of the third interstellar object, comet 3I/ATLAS, have been obtained

Meta and Oakley unveil new glasses with artificial intelligence and 3K shooting

AI learned to "look" like a human: Japanese scientists revealed the abilities of Vision Transformers

A monument to Sergei Yesenin is planned to be erected in Tashkent

IT Park Uzbekistan expands cooperation with companies in the UAE and the MENA region

Telegram expands its capabilities: cover videos, save progress, and search for stickers with AI

Trend Micro introduced a new AI Security solution for protecting cloud and on-premises IT systems

Узбекистан формирует Секретариат Совета иностранных инвесторов

Утверждён план мероприятий по итогам Совета иностранных инвесторов

Узбекистан и Турция наращивают экономическое партнёрство и образовательные связи

Илон Маск запускает Baby Grok — чат-бот нового поколения для детей

Узбекистан формирует Секретариат Совета иностранных инвесторов

Microsoft направит 1,7 миллиарда долларов на покупку навоза, экскрементов и отходов

Утверждён план мероприятий по итогам Совета иностранных инвесторов

Глобальные технологические тренды 2025 года и их влияние на экономику будущего

Массовое распространение ИИ требует пересмотра границ доверия к технологии

Maili NewsMaili Company