Meta Connect introduces Emu text-to-video generator
Meta Connect has introduced Emu, a foundational model for image generation. They have also announced new developments, including Emu Video, a method for text-to-video generation based on diffusion models and Emu Edit, the new image editor.
Meta Connect introduced Emu, a foundational image generation model that underpins various generative AI experiences, including AI image editing tools for Instagram and the imagine feature in Meta AI. The new models are based solely on text instructions and a text-to-video generation method.
Emu Video, leveraging the Emu model, offers a unified architecture for text-to-video generation based on diffusion models. This approach generates images based on a text prompt and creates videos conditioned on text and rendered images, allowing efficient training of video generation models. The simplicity of using just two diffusion models outperforms prior work, generating 512×512 four-second videos at 16 frames per second. Human evaluations prefer it over existing models for quality and faithfulness to the text prompt.
In addition to the Emu video, Meta Connect has introduced Emu Edit, a novel way of image editing. Emu Edit presents a new approach to streamline image manipulation tasks, offering enhanced capabilities and precision. It allows free-form editing through instructions, covering functions like local and global editing, background removal/addition, colour and geometry transformations, and more. Unlike many generative AI models, Emu Edit precisely follows instructions, ensuring unrelated pixels in the input image remain untouched. With computer vision tasks as instructions, Emu Edit demonstrates superior performance in image editing tasks and produces state-of-the-art results.