The American company xAI, founded by Elon Musk, introduced the flagship language model of a new generation — Grok 4. For the first time in the history of independent comparisons of language models, a product that is not included in the so-called "big three" developers (OpenAI, Google, and Anthropic) has taken the first position in international performance ratings. This result can change the balance of power in the generative artificial intelligence market, where competition moves from the technological to the economic and political plane.
Grok 4 became the first model to top the Intelligence Index rating from the Artificial Analysis analytical platform, scoring 73 points. For comparison, GPT from OpenAI and Gemini from Google — 70 points each, Claude Opus from Anthropic — 64. This achievement marks not only the technological success of xAI, but also creates the prerequisites for the emergence of a "fourth center of power" in the field of artificial intelligence.
The model performed well in key benchmarks. In particular, Grok 4 scored 94% in the AIME 2024 math test, and 88% in the most complex GPQA Diamond, which evaluates the depth of logical reasoning. It is emphasized that the model is capable not only of text generation, but also of multi-modal processing, including image interpretation and execution of software functions. At the same time, the company recognizes that the multimodal mode is still far from being fully implemented, and significant improvements are expected in future versions.
Particular attention was drawn to the Grok 4 result in the ARC-AGI-2 test — one of the few benchmarks that claim to be an objective assessment of flexible intelligence that is close to human. The model set an absolute record, gaining 15.9%, surpassing the statistical noise threshold (10%) and significantly beating the previous leader — Claude Opus 4 with 8%. This event has become an important indicator of a possible movement towards general artificial intelligence systems, although the results are still far from the human level in absolute terms.
Along with Grok 4, an experimental version of Grok 4 Heavy was introduced. It uses a multi-agent architecture — several agents work in parallel, and then compare their hypotheses and come to a collective conclusion. It was this modification that showed the highest result in the Humanity's Last Exam test, overcoming 44.4% when using tools. For comparison, Gemini 2.5 Pro showed 26.9%, and OpenAI o3-only 21%. The emergence of multi-agent models in the commercial space may mean a shift from linear generation to simulating more complex forms of cognitive processing that are closer to the group work of human experts.
Despite scientific advances, the launch of Grok 4 was accompanied by a management crisis. Simultaneously with the announcement of the model, the general director of the X social network, Linda Yaccarino, was dismissed. In addition, the official Grok account in X was at the center of a scandal after the publication of materials with anti-Semitic statements. In response to public pressure, xAI revised the "system prompt — - the internal guidelines that govern the tone and acceptable language of the model, removing the instruction that allows "politically incorrect" responses.
Despite the reputational risks, the company relies on commercial monetization. Grok 4 is available on the pay-as-you-go model: $ 3 per million incoming and $ 15 per million outgoing tokens. This corresponds to the cost of Claude Sonnet, but is more expensive than Gemini and OpenAI o3. The generation rate — 75 tokens per second-is higher than that of Claude Opus, but is inferior to Google's flagships. The context window of the model is 256 thousand tokens — less than that of Gemini 2.5 Pro (1 million), but significantly more compared to most competitors on the market.
For users of the social network X, a premium SuperGrok Heavy tariff is offered for $ 300 per month. This direction involves an in-depth integration of the model into the media platform managed by the same owner — Elon Musk. Analysts note that such an approach can be the beginning of creating a vertically integrated ecosystem in which AI, social networks and computing infrastructure co-exist under a single management.
xAI's strategy demonstrates a steady move to break the monopoly of industry leaders. At the same time, the company relies not so much on accessibility, but on technological leadership in the field of architectures closest to general artificial intelligence. Given that xAI already uses computing resources on xCloud clusters and is actively expanding its partner network, its potential as an independent player in the highly competitive environment of the global AI market is estimated to be rapidly growing.