▌锅头导读
▌《Stable Diffusion 3.5介绍》-谷歌翻译版
Stable Diffusion 3.5介绍
时间:10 月 22 日
10 月 29 日更新,发布稳定版 Diffusion 3.5 Medium
关键要点:
今天我们推出的是 Stable Diffusion 3.5。此开放版本包含多个模型变体,包括 Stable Diffusion 3.5 Large 和 Stable Diffusion 3.5 Large Turbo,以及自 10 月 29 日起推出的 Stable Diffusion 3.5 Medium。
这些模型的尺寸可高度定制,可在消费级硬件上运行,并且根据宽松的Stability AI 社区许可,可免费用于商业和非商业用途。
您现在可以从Hugging Face下载所有 Stable Diffusion 3.5 模型和GitHub上的推理代码。
今天,我们发布了 Stable Diffusion 3.5,这是我们迄今为止最强大的模型。此开放版本包含多个可定制的变体,可在消费级硬件上运行,并可在宽松的Stability AI 社区许可证下使用。您现在可以从Hugging Face下载 Stable Diffusion 3.5 Large 和 Stable Diffusion 3.5 Large Turbo 模型,并在 GitHub 上下载推理代码。
6 月,我们发布了 Stable Diffusion 3 Medium,这是 Stable Diffusion 3 系列的第一个开放版本。此版本没有完全满足我们的标准或社区的期望。在听取了宝贵的社区反馈后,我们没有采取快速修复措施,而是花时间进一步开发一个版本,以推进我们改造视觉媒体的使命。
Stable Diffusion 3.5 体现了我们致力于为开发者和创作者提供广泛可用、先进且在大多数情况下免费的工具的承诺。我们鼓励在整个流程中分发和货币化工作 - 无论是微调、LoRA、优化、应用程序还是艺术作品。
发布内容
Stable Diffusion 3.5 提供了多种模型,旨在满足科学研究人员、业余爱好者、初创企业和企业的需求:
Stable Diffusion 3.5 Large:该基础型号拥有 81 亿个参数,质量卓越,响应迅速,是 Stable Diffusion 系列中最强大的型号。该型号非常适合 1 百万像素分辨率的专业用例。
稳定扩散 3.5 Large Turbo:稳定扩散 3.5 Large 的精简版仅需 4 个步骤即可生成高质量图像,且具有出色的快速依从性,速度比稳定扩散 3.5 Large 快得多。
Stable Diffusion 3.5 Medium:该模型拥有 25 亿个参数,采用改进的 MMDiT-X 架构和训练方法,可在消费级硬件上“开箱即用”,在质量和定制易用性之间取得平衡。它能够生成分辨率在 0.25 到 2 百万像素之间的图像。
开发模型
在开发模型时,我们优先考虑可定制性,以提供灵活的构建基础。为了实现这一点,我们将查询键规范化集成到转换器块中,稳定了模型训练过程并简化了进一步的微调和开发。
为了支持这种下游灵活性,我们必须做出一些权衡。使用不同种子的同一提示可能会产生更大的输出差异,这是有意为之,因为它有助于在基础模型中保留更广泛的知识库和多样化的风格。然而,结果,缺乏特异性的提示可能会导致输出的不确定性增加,并且美学水平可能会有所不同。
具体来说,对于 Medium 模型,我们对架构和训练协议进行了一些调整,以提高质量、连贯性和多分辨率生成能力。
模型的优势
Stable Diffusion 3.5 版本在以下方面表现出色,使其成为市场上最可定制、最易于访问的图像模型之一,同时在及时性和图像质量方面保持顶级性能:
可定制性:轻松微调模型以满足您的特定创作需求,或根据定制的工作流程构建应用程序。
高效性能:经过优化,可在标准消费硬件上运行,无需繁重工作,尤其是 Stable Diffusion 3.5 Medium 和 Stable Diffusion 3.5 Large Turbo 型号。
我们查看了运行 Stable Diffusion 3.5 Medium 以及其他开放图像基础模型的硬件兼容性。此模型仅需要 9.9 GB 的 VRAM(不包括文本编码器)即可发挥其全部性能,使其高度可访问且与大多数消费级 GPU 兼容。
多样化输出:创建代表世界的图像,而不仅仅是一种类型的人,具有不同的肤色和特征,无需大量提示。
风格多样:能够生成各种风格和美感,如 3D、摄影、绘画、线条艺术以及几乎任何可以想象的视觉风格。
此外,我们的分析表明,Stable Diffusion 3.5 Large在及时遵守方面处于市场领先地位,并且在图像质量方面可与更大的型号相媲美。
Stable Diffusion 3.5 Large Turbo提供了同类模型中最快的推理时间,同时在图像质量和及时性方面保持了高度竞争力,即使与类似尺寸的非蒸馏模型相比也是如此
Stable Diffusion 3.5 Medium 的表现优于其他中型型号,在迅速遵守和图像质量之间实现了平衡,使其成为高效、高质量性能的首选。
Stability AI 社区许可证一览
我们很高兴根据我们的宽松社区许可证发布此模型。以下是许可证的关键组成部分:
非商业用途免费:个人和组织可以免费将该模型用于非商业用途,包括科学研究。
免费用于商业用途(年收入最高 100 万美元):初创企业、中小型企业和创作者可以免费将该模型用于商业用途,只要他们的年总收入低于 100 万美元。
输出所有权:保留所产生的媒体的所有权,不受限制性许可的影响。
对于年收入超过 100 万美元的组织,请在此处联系我们,咨询企业许可证。
访问模型的更多方式
虽然模型权重现在可以在 Hugging Face 上自托管,但您也可以通过以下平台访问该模型:
稳定性 AI API
复制
DeepInfra
舒适的用户界面
我们对安全的承诺
我们相信安全、负责任的 AI 实践,并采取深思熟虑的措施确保 Integrity 在开发早期阶段就已开始。这意味着我们已经采取并将继续采取合理措施,防止不良行为者滥用 Stable Diffusion 3.5。有关我们的安全方法的更多信息,请访问我们的Stable Safety页面。
即将推出
我们还将很快推出 ControlNets,为各种专业用例提供先进的控制功能。
我们期待听到您对 Stable Diffusion 3.5 的反馈,并看到您使用这些模型创建的内容。您可以通过此表单直接与我们分享想法。
要了解我们的最新进展,请在X、LinkedIn、Instagram上关注我们,并加入我们的Discord 社区。
▌《Stable Diffusion 3.5介绍》英文原文版
Introducing Stable Diffusion 3.5
Key Takeaways:
Today we are introducing Stable Diffusion 3.5. This open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo, and as of October 29th, Stable Diffusion 3.5 Medium.
These models are highly customizable for their size, run on consumer hardware, and are free for both commercial and non-commercial use under the permissive Stability AI Community License.
You can download all Stable Diffusion 3.5 models from Hugging Face and the inference code on GitHub now.
Today we are releasing Stable Diffusion 3.5, our most powerful models yet. This open release includes multiple variants that are customizable, run on consumer hardware, and are available for use under the permissive Stability AI Community License. You can download Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo models from Hugging Face and the inference code on GitHub now.
In June, we released Stable Diffusion 3 Medium, the first open release from the Stable Diffusion 3 series. This release didn't fully meet our standards or our communities’ expectations. After listening to the valuable community feedback, instead of a quick fix, we took the time to further develop a version that advances our mission to transform visual media.
Stable Diffusion 3.5 reflects our commitment to empower builders and creators with tools that are widely accessible, cutting-edge, and free for most use cases. We encourage the distribution and monetization of work across the entire pipeline - whether it's fine-tuning, LoRA, optimizations, applications, or artwork.
What’s being released
Stable Diffusion 3.5 offers a variety of models developed to meet the needs of scientific researchers, hobbyists, startups, and enterprises alike:
Stable Diffusion 3.5 Large: At 8.1 billion parameters, with superior quality and prompt adherence, this base model is the most powerful in the Stable Diffusion family. This model is ideal for professional use cases at 1 megapixel resolution.
Stable Diffusion 3.5 Large Turbo: A distilled version of Stable Diffusion 3.5 Large generates high-quality images with exceptional prompt adherence in just 4 steps, making it considerably faster than Stable Diffusion 3.5 Large.
Stable Diffusion 3.5 Medium:At 2.5 billion parameters, with improved MMDiT-X architecture and training methods, this model is designed to run “out of the box” on consumer hardware, striking a balance between quality and ease of customization. It is capable of generating images ranging between 0.25 and 2 megapixel resolution.
Developing the models
In developing the models, we prioritized customizability to offer a flexible base to build upon. To achieve this, we integrated Query-Key Normalization into the transformer blocks, stabilizing the model training process and simplifying further fine-tuning and development.
To support this level of downstream flexibility, we had to make some trade-offs. Greater variation in outputs from the same prompt with different seeds may occur, which is intentional as it helps preserve a broader knowledge-base and diverse styles in the base models. However, as a result, prompts lacking specificity might lead to increased uncertainty in the output, and the aesthetic level may vary.
For the Medium model specifically, we made several adjustments to the architecture and training protocols to enhance quality, coherence, and multi-resolution generation abilities.
Where the models excel
The Stable Diffusion 3.5 version excels in the following areas, making it one of the most customizable and accessible image models on the market, while maintaining top-tier performance in prompt adherence and image quality:
Customizability: Easily fine-tune the model to meet your specific creative needs, or build applications based on customized workflows.
Efficient Performance: Optimized to run on standard consumer hardware without heavy demands, especially the Stable Diffusion 3.5 Medium and Stable Diffusion 3.5 Large Turbo models.
We took a look at the hardware compatibility for running Stable Diffusion 3.5 Medium alongside other open-image base models. This model only requires 9.9 GB of VRAM (excluding text encoders) to unlock its full performance, making it highly accessible and compatible with most consumer GPUs.
Diverse Outputs: Creates images representative of the world, not just one type of person, with different skin tones and features, without the need for extensive prompting.
Versatile Styles: Capable of generating a wide range of styles and aesthetics like 3D, photography, painting, line art, and virtually any visual style imaginable.
Additionally, our analysis shows that Stable Diffusion 3.5 Large leads the market in prompt adherence and rivals much larger models in image quality.
Stable Diffusion 3.5 Large Turbo offers some of the fastest inference times for its size, while remaining highly competitive in both image quality and prompt adherence, even when compared to non-distilled models of similar size
Stable Diffusion 3.5 Medium outperforms other medium-sized models, offering a balance of prompt adherence and image quality, making it a top choice for efficient, high-quality performance.
The Stability AI Community license at a glance
We are pleased to release this model under our permissive community license. Here are the key components of the license:
Free for non-commercial use: Individuals and organizations can use the model free of charge for non-commercial use, including scientific research.
Free for commercial use (up to $1M in annual revenue): Startups, small to medium-sized businesses, and creators can use the model for commercial purposes at no cost, as long as their total annual revenue is less than $1M.
Ownership of outputs: Retain ownership of the media generated without restrictive licensing implications.
For organizations with annual revenue more than $1M, please contact us here to inquire about an Enterprise License.
More ways to access the models
While the model weights are available on Hugging Face now for self-hosting, you can also access the model through the following platforms:
Stability AI API
Replicate
DeepInfra
ComfyUI
Our commitment to safety
We believe in safe, responsible AI practices and take deliberate measures to ensure Integrity starts at the early stages of development. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3.5 by bad actors. For more information about our approach to Safety please visit our Stable Safety page.
Coming soon
We will also launch ControlNets soon, providing advanced control features for a wide variety of professional use cases.
We look forward to hearing your feedback on Stable Diffusion 3.5 and seeing what you create with the models. You can share thoughts directly with us through this form.
To stay updated on our progress follow us on X, LinkedIn, Instagram, and join our Discord Community.