The Chinese company ByteDance announced Goku — a family of neural networks designed for creating videos. The development was carried out in collaboration with researchers from the University of Hong Kong.
Goku supports video generation in horizontal and vertical formats and works in several modes: Text to Video, Image to Video, and Text to Image. The neural network architecture is based on Rectified flow transformers technology, which corrects the data flow between tokens, providing a more realistic and detailed image.
In VBench tests, the Goku model scored 84.85 points, beating Pika-1.0, OpenSora V1.2, Kling and Mira in the Text to Video rating. In other tests-GenEval and DPG-Bench-the neural network scored 0.76 and 83.65 points, respectively.
In addition to the basic version, Goku+ is a model focused on creating promotional videos up to 20 seconds long. It is able to generate realistic people who gesture and interact with objects in the frame, including advertised products.