
文摘   2024-08-25 22:27   新加坡  




▌FLUX.1 介绍原文+中文翻译

Announcing Black Forest Labs


Aug 1, 2024
Today, we are excited to announce the launch of Black Forest Labs. Deeply rooted in the generative AI research community, our mission is to develop and advance state-of-the-art generative deep learning models for media such as images and videos, and to push the boundaries of creativity, efficiency and diversity. We believe that generative AI will be a fundamental building block of all future technologies. By making our models available to a wide audience, we want to bring its benefits to everyone, educate the public and enhance trust in the safety of these models. We are determined to build the industry standard for generative media. Today, as the first step towards this goal, we release the FLUX.1 suite of models that push the frontiers of text-to-image synthesis.
今天,我们很高兴地宣布推出黑森森实验室。我们深深植根于生成式 AI 研究社区,我们的使命是为图像和视频等媒体开发和推进最先进的生成式深度学习模型,并突破创造力、效率和多样性的界限。我们相信,生成式 AI 将成为所有未来技术的基本组成部分。通过向广大受众提供我们的模型,我们希望将其优势带给每个人,教育公众并增强对这些模型安全性的信任。我们决心为生成媒体建立行业标准。今天,作为实现这一目标的第一步,我们发布了 FLUX.1 模型套件,这些模型推动了文本到图像合成的前沿。

The Black Forest Team
We are a team of distinguished AI researchers and engineers with an outstanding track record in developing foundational generative AI models in academic, industrial, and open-source environments. Our innovations include creating VQGAN and Latent Diffusion, The Stable Diffusion models for image and video generation (Stable Diffusion XL, Stable Video Diffusion, Rectified Flow Transformers), and Adversarial Diffusion Distillation for ultra-fast, real-time image synthesis.
Our core belief is that widely accessible models not only foster innovation and collaboration within the research community and academia, but also increase transparency, which is essential for trust and broad adoption. Our team strives to develop the highest quality technology and to make it accessible to the broadest audience possible.
我们是一支由杰出的 AI 研究人员和工程师组成的团队,在学术、工业和开源环境中开发基础生成式 AI 模型方面有着出色的记录。我们的创新包括创建 VQGAN 和 Latent Diffusion、用于图像和视频生成的 Stable Diffusion 模型(Stable Diffusion XL、Stable Video Diffusion、Rectified Flow Transformers)以及用于超快速、实时图像合成的对抗性扩散蒸馏。

We are excited to announce the successful closing of our Series Seed funding round of $31 million. This round was led by our main investor, Andreessen Horowitz, including notable participation from angel investors Brendan Iribe, Michael Ovitz, Garry Tan, Timo Aila and Vladlen Koltun and other renowned experts in AI research and company building. We have received follow-up investments from General Catalyst and MätchVC to support us on our mission to bring state-of-the-art AI from Europe to everyone around the world.
Furthermore, we are pleased to announce our advisory board, including Michael Ovitz, bringing extensive experience in the content creation industry, and Prof. Matthias Bethge, pioneer of neural style transfer and leading expert in open European AI research.
我们很高兴地宣布,我们的 3100 万美元种子轮融资已成功完成。本轮融资由我们的主要投资者 Andreessen Horowitz 领投,天使投资人 Brendan Iribe、Michael Ovitz、Garry Tan、Timo Aila 和 Vladlen Koltun 以及其他人工智能研究和公司建设领域的知名专家也参与了本轮融资。我们收到了 General Catalyst 和 MätchVC 的后续投资,以支持我们完成将欧洲最先进的 AI 带给世界各地的每个人的使命。
此外,我们很高兴地宣布我们的顾问委员会,包括在内容创作行业拥有丰富经验的 Michael Ovitz,以及神经风格迁移的先驱和开放欧洲 AI 研究的领先专家 Matthias Bethge 教授。

Flux.1 Model Family
Flux.1 模型系列

We release the FLUX.1 suite of text-to-image models that define a new state-of-the-art in image detail, prompt adherence, style diversity and scene complexity for text-to-image synthesis.
To strike a balance between accessibility and model capabilities, FLUX.1 comes in three variants: FLUX.1 [pro], FLUX.1 [dev] and FLUX.1 [schnell]:
• FLUX.1 [pro]: The best of FLUX.1, offering state-of-the-art performance image generation with top of the line prompt following, visual quality, image detail and output diversity. Sign up for FLUX.1 [pro] access via our API here. FLUX.1 [pro] is also available via Replicate and fal.ai. Moreover we offer dedicated and customized enterprise solutions – reach out via flux@blackforestlabs.ai to get in touch.
• FLUX.1 [dev]: FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications. Directly distilled from FLUX.1 [pro], FLUX.1 [dev] obtains similar quality and prompt adherence capabilities, while being more efficient than a standard model of the same size. FLUX.1 [dev] weights are available on HuggingFace and can be directly tried out on Replicate or Fal.ai. For applications in commercial contexts, get in touch out via flux@blackforestlabs.ai.
• FLUX.1 [schnell]: our fastest model is tailored for local development and personal use. FLUX.1 [schnell] is openly available under an Apache2.0 license. Similar, FLUX.1 [dev], weights are available on Hugging Face and inference code can be found on GitHub and in HuggingFace’s Diffusers. Moreover we’re happy to have day-1 integration for ComfyUI.
我们发布了 FLUX.1 文本到图像模型套件,该套件在文本到图像合成的图像细节、提示依从性、样式多样性和场景复杂性方面定义了新的先进技术。
为了在辅助功能和模型功能之间取得平衡,FLUX.1 提供了三个变体:FLUX.1 [pro]、FLUX.1 [dev] 和 FLUX.1 [schnell]:
• FLUX.1 [pro]:FLUX.1 的精华,提供最先进的性能图像生成,具有顶级的提示跟随、视觉质量、图像细节和输出多样性。在此处通过我们的 API 注册 FLUX.1 [pro] 访问权限。FLUX.1 [pro] 也可通过 Replicate 和 fal.ai 获得。此外,我们还提供专用和定制的企业解决方案 – 通过 flux@blackforestlabs.ai 联系我们。
• FLUX.1 [dev]:FLUX.1 [dev] 是一种用于非商业应用的开放轻量级、指导性提炼模型。直接从 FLUX.1 [pro] 中提炼出来,FLUX.1 [dev] 获得了类似的质量和及时依从性能力,同时比相同尺寸的标准型号更高效。FLUX.1 [dev] 权重在 HuggingFace 上提供,可以直接在 Replicate 或 Fal.ai 上试用。对于商业环境中的应用程序,请通过 flux@blackforestlabs.ai 联系我们。
• FLUX.1 [schnell]:我们最快的型号是为本地开发和个人使用量身定制的。FLUX.1 [schnell] 在 Apache2.0 许可下公开提供。类似的 FLUX.1 [开发] 权重在 Hugging Face 上可用,推理代码可以在 GitHub 和 HuggingFace 的 Diffusers 中找到。此外,我们很高兴为 ComfyUI 提供 day-1 集成。

Transformer-powered Flow Models at Scale
All public FLUX.1 models are based on a hybrid architecture of multimodal and parallel diffusion transformer blocks and scaled to 12B parameters. We improve over previous state-of-the-art diffusion models by building on flow matching, a general and conceptually simple method for training generative models, which includes diffusion as a special case. In addition, we increase model performance and improve hardware efficiency by incorporating rotary positional embeddings and parallel attention layers. We will publish a more detailed tech report in the near future.
所有公共 FLUX.1 模型都基于多模态和并行扩散变压器模块的混合架构,并扩展到 12B 参数。我们通过构建流匹配来改进以前最先进的扩散模型,流匹配是一种用于训练生成模型的通用且概念简单的方法,其中包括扩散作为一种特殊情况。此外,我们通过整合旋转位置嵌入和并行注意力层来提高模型性能并提高硬件效率。我们将在不久的将来发布更详细的技术报告。

A new Benchmark for Image Synthesis
FLUX.1 defines the new state-of-the-art in image synthesis. Our models set new standards in their respective model class. FLUX.1 [pro] and [dev] surpass popular  models like Midjourney v6.0, DALL·E 3 (HD) and SD3-Ultra in each of the following aspects: Visual Quality, Prompt Following, Size/Aspect Variability, Typography and Output Diversity. FLUX.1 [schnell] is the most advanced few-step model to date, outperforming not even its in-class competitors but also strong non-distilled models like Midjourney v6.0 and DALL·E 3 (HD) .  Our models are specifically finetuned to preserve the entire output diversity from pretraining. Compared to the current state-of-the-art they offer drastically improved possibilities as shown below
FLUX.1 定义了图像合成领域的新技术。我们的模型在各自的模型类别中树立了新标准。FLUX.1 [pro] 和 [dev] 超越了 Midjourney v6.0、DALL·E 3 (HD) 和 SD3-Ultra 在以下各个方面:视觉质量、提示跟随、大小/纵横比可变性、排版和输出多样性。FLUX.1 [schnell] 是迄今为止最先进的少步模型,其性能甚至超过了同级竞争对手,还超过了 Midjourney v6.0 和 DALL·E 3 (高清) .我们的模型经过专门微调,以保留预训练的整个输出多样性。与目前最先进的技术相比,它们提供了大大改进的可能性,如下所示。

All FLUX.1 model variants support a diverse range of aspect ratios and resolutions in 0.1 and 2.0 megapixels, as shown in the following example.
所有 FLUX.1 型号变体都支持 0.1 和 2.0 MP 的各种纵横比和分辨率,如以下示例所示。

Up Next: SOTA Text-to-Video for All
下一步:适合所有人的 SOTA 文本到视频
Today we release the FLUX.1 text-to-image model suite. With their strong creative capabilities, these models serve as a powerful foundation for our upcoming suite of competitive generative text-to-video systems. Our video models will unlock precise creation and editing at high definition and unprecedented speed. We are committed to continue pioneering the future of generative media.
今天,我们发布了 FLUX.1 文本到图像模型套件。凭借其强大的创意能力,这些模型为我们即将推出的具有竞争力的生成式文本到视频系统套件奠定了强大的基础。我们的视频模型将以高清和前所未有的速度解锁精确的创建和编辑。我们致力于继续开拓生成媒体的未来。

Join Us!
We are hiring exceptionally strong machine learning and backend engineers. If you are interested in joining our team, reach out to careers@blackforestlabs.ai.
我们正在招聘非常强大的机器学习和后端工程师。如果您有兴趣加入我们的团队,请联系 careers@blackforestlabs.ai。


[1] 黑森林实验室官方文章 https://blackforestlabs.ai/announcing-black-forest-labs/
