【AI学习】Stable Diffusion 3.5介绍原文+中文翻译

文摘 2024-10-30 21:00 新加坡

▌锅头导读

2024年10月22日，Stability AI团队发布Stable Diffusion 3.5，该模型可免费用于商业或非商业用途。

本文是锅头了解Stable Diffusion 3.5的学习笔记，供有需求的同学一起学习参考。

▌《Stable Diffusion 3.5介绍》-谷歌翻译版

Stable Diffusion 3.5介绍

时间：10 月 22 日

10 月 29 日更新，发布稳定版 Diffusion 3.5 Medium

关键要点：

今天我们推出的是 Stable Diffusion 3.5。此开放版本包含多个模型变体，包括 Stable Diffusion 3.5 Large 和 Stable Diffusion 3.5 Large Turbo，以及自 10 月 29 日起推出的 Stable Diffusion 3.5 Medium。
这些模型的尺寸可高度定制，可在消费级硬件上运行，并且根据宽松的Stability AI 社区许可，可免费用于商业和非商业用途。
您现在可以从Hugging Face下载所有 Stable Diffusion 3.5 模型和GitHub上的推理代码。

今天，我们发布了 Stable Diffusion 3.5，这是我们迄今为止最强大的模型。此开放版本包含多个可定制的变体，可在消费级硬件上运行，并可在宽松的Stability AI 社区许可证下使用。您现在可以从Hugging Face下载 Stable Diffusion 3.5 Large 和 Stable Diffusion 3.5 Large Turbo 模型，并在 GitHub 上下载推理代码。

6 月，我们发布了 Stable Diffusion 3 Medium，这是 Stable Diffusion 3 系列的第一个开放版本。此版本没有完全满足我们的标准或社区的期望。在听取了宝贵的社区反馈后，我们没有采取快速修复措施，而是花时间进一步开发一个版本，以推进我们改造视觉媒体的使命。

Stable Diffusion 3.5 体现了我们致力于为开发者和创作者提供广泛可用、先进且在大多数情况下免费的工具的承诺。我们鼓励在整个流程中分发和货币化工作 - 无论是微调、LoRA、优化、应用程序还是艺术作品。

发布内容

Stable Diffusion 3.5 提供了多种模型，旨在满足科学研究人员、业余爱好者、初创企业和企业的需求：

Stable Diffusion 3.5 Large：该基础型号拥有 81 亿个参数，质量卓越，响应迅速，是 Stable Diffusion 系列中最强大的型号。该型号非常适合 1 百万像素分辨率的专业用例。
稳定扩散 3.5 Large Turbo：稳定扩散 3.5 Large 的精简版仅需 4 个步骤即可生成高质量图像，且具有出色的快速依从性，速度比稳定扩散 3.5 Large 快得多。
Stable Diffusion 3.5 Medium：该模型拥有 25 亿个参数，采用改进的 MMDiT-X 架构和训练方法，可在消费级硬件上“开箱即用”，在质量和定制易用性之间取得平衡。它能够生成分辨率在 0.25 到 2 百万像素之间的图像。

开发模型

在开发模型时，我们优先考虑可定制性，以提供灵活的构建基础。为了实现这一点，我们将查询键规范化集成到转换器块中，稳定了模型训练过程并简化了进一步的微调和开发。

为了支持这种下游灵活性，我们必须做出一些权衡。使用不同种子的同一提示可能会产生更大的输出差异，这是有意为之，因为它有助于在基础模型中保留更广泛的知识库和多样化的风格。然而，结果，缺乏特异性的提示可能会导致输出的不确定性增加，并且美学水平可能会有所不同。

具体来说，对于 Medium 模型，我们对架构和训练协议进行了一些调整，以提高质量、连贯性和多分辨率生成能力。

模型的优势

Stable Diffusion 3.5 版本在以下方面表现出色，使其成为市场上最可定制、最易于访问的图像模型之一，同时在及时性和图像质量方面保持顶级性能：

可定制性：轻松微调模型以满足您的特定创作需求，或根据定制的工作流程构建应用程序。
高效性能：经过优化，可在标准消费硬件上运行，无需繁重工作，尤其是 Stable Diffusion 3.5 Medium 和 Stable Diffusion 3.5 Large Turbo 型号。
我们查看了运行 Stable Diffusion 3.5 Medium 以及其他开放图像基础模型的硬件兼容性。此模型仅需要 9.9 GB 的 VRAM（不包括文本编码器）即可发挥其全部性能，使其高度可访问且与大多数消费级 GPU 兼容。

多样化输出：创建代表世界的图像，而不仅仅是一种类型的人，具有不同的肤色和特征，无需大量提示。

风格多样：能够生成各种风格和美感，如 3D、摄影、绘画、线条艺术以及几乎任何可以想象的视觉风格。

此外，我们的分析表明，Stable Diffusion 3.5 Large在及时遵守方面处于市场领先地位，并且在图像质量方面可与更大的型号相媲美。

Stable Diffusion 3.5 Large Turbo提供了同类模型中最快的推理时间，同时在图像质量和及时性方面保持了高度竞争力，即使与类似尺寸的非蒸馏模型相比也是如此

Stable Diffusion 3.5 Medium 的表现优于其他中型型号，在迅速遵守和图像质量之间实现了平衡，使其成为高效、高质量性能的首选。

Stability AI 社区许可证一览

我们很高兴根据我们的宽松社区许可证发布此模型。以下是许可证的关键组成部分：

非商业用途免费：个人和组织可以免费将该模型用于非商业用途，包括科学研究。
免费用于商业用途（年收入最高 100 万美元）：初创企业、中小型企业和创作者可以免费将该模型用于商业用途，只要他们的年总收入低于 100 万美元。
输出所有权：保留所产生的媒体的所有权，不受限制性许可的影响。

对于年收入超过 100 万美元的组织，请在此处联系我们，咨询企业许可证。

访问模型的更多方式

虽然模型权重现在可以在 Hugging Face 上自托管，但您也可以通过以下平台访问该模型：

稳定性 AI API
复制
DeepInfra
舒适的用户界面

我们对安全的承诺

我们相信安全、负责任的 AI 实践，并采取深思熟虑的措施确保 Integrity 在开发早期阶段就已开始。这意味着我们已经采取并将继续采取合理措施，防止不良行为者滥用 Stable Diffusion 3.5。有关我们的安全方法的更多信息，请访问我们的Stable Safety页面。

即将推出

我们还将很快推出 ControlNets，为各种专业用例提供先进的控制功能。

我们期待听到您对 Stable Diffusion 3.5 的反馈，并看到您使用这些模型创建的内容。您可以通过此表单直接与我们分享想法。

要了解我们的最新进展，请在X、LinkedIn、Instagram上关注我们，并加入我们的Discord 社区。

▌《Stable Diffusion 3.5介绍》英文原文版

Introducing Stable Diffusion 3.5

September 23, 2024

Updated October 29th with release of Stable Diffusion 3.5 Medium

Key Takeaways:

Today we are introducing Stable Diffusion 3.5. This open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo, and as of October 29th, Stable Diffusion 3.5 Medium.
These models are highly customizable for their size, run on consumer hardware, and are free for both commercial and non-commercial use under the permissive Stability AI Community License.
You can download all Stable Diffusion 3.5 models from Hugging Face and the inference code on GitHub now.

Today we are releasing Stable Diffusion 3.5, our most powerful models yet. This open release includes multiple variants that are customizable, run on consumer hardware, and are available for use under the permissive Stability AI Community License. You can download Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo models from Hugging Face and the inference code on GitHub now.

In June, we released Stable Diffusion 3 Medium, the first open release from the Stable Diffusion 3 series. This release didn't fully meet our standards or our communities’ expectations. After listening to the valuable community feedback, instead of a quick fix, we took the time to further develop a version that advances our mission to transform visual media.

Stable Diffusion 3.5 reflects our commitment to empower builders and creators with tools that are widely accessible, cutting-edge, and free for most use cases. We encourage the distribution and monetization of work across the entire pipeline - whether it's fine-tuning, LoRA, optimizations, applications, or artwork.

What’s being released

Stable Diffusion 3.5 offers a variety of models developed to meet the needs of scientific researchers, hobbyists, startups, and enterprises alike:

Stable Diffusion 3.5 Large: At 8.1 billion parameters, with superior quality and prompt adherence, this base model is the most powerful in the Stable Diffusion family. This model is ideal for professional use cases at 1 megapixel resolution.
Stable Diffusion 3.5 Large Turbo: A distilled version of Stable Diffusion 3.5 Large generates high-quality images with exceptional prompt adherence in just 4 steps, making it considerably faster than Stable Diffusion 3.5 Large.
Stable Diffusion 3.5 Medium:At 2.5 billion parameters, with improved MMDiT-X architecture and training methods, this model is designed to run “out of the box” on consumer hardware, striking a balance between quality and ease of customization. It is capable of generating images ranging between 0.25 and 2 megapixel resolution.

Developing the models

In developing the models, we prioritized customizability to offer a flexible base to build upon. To achieve this, we integrated Query-Key Normalization into the transformer blocks, stabilizing the model training process and simplifying further fine-tuning and development.

To support this level of downstream flexibility, we had to make some trade-offs. Greater variation in outputs from the same prompt with different seeds may occur, which is intentional as it helps preserve a broader knowledge-base and diverse styles in the base models. However, as a result, prompts lacking specificity might lead to increased uncertainty in the output, and the aesthetic level may vary.

For the Medium model specifically, we made several adjustments to the architecture and training protocols to enhance quality, coherence, and multi-resolution generation abilities.

Where the models excel

The Stable Diffusion 3.5 version excels in the following areas, making it one of the most customizable and accessible image models on the market, while maintaining top-tier performance in prompt adherence and image quality:

Customizability: Easily fine-tune the model to meet your specific creative needs, or build applications based on customized workflows.
Efficient Performance: Optimized to run on standard consumer hardware without heavy demands, especially the Stable Diffusion 3.5 Medium and Stable Diffusion 3.5 Large Turbo models.
We took a look at the hardware compatibility for running Stable Diffusion 3.5 Medium alongside other open-image base models. This model only requires 9.9 GB of VRAM (excluding text encoders) to unlock its full performance, making it highly accessible and compatible with most consumer GPUs.

Diverse Outputs: Creates images representative of the world, not just one type of person, with different skin tones and features, without the need for extensive prompting.

Versatile Styles: Capable of generating a wide range of styles and aesthetics like 3D, photography, painting, line art, and virtually any visual style imaginable.

Additionally, our analysis shows that Stable Diffusion 3.5 Large leads the market in prompt adherence and rivals much larger models in image quality.

Stable Diffusion 3.5 Large Turbo offers some of the fastest inference times for its size, while remaining highly competitive in both image quality and prompt adherence, even when compared to non-distilled models of similar size

Stable Diffusion 3.5 Medium outperforms other medium-sized models, offering a balance of prompt adherence and image quality, making it a top choice for efficient, high-quality performance.

The Stability AI Community license at a glance

We are pleased to release this model under our permissive community license. Here are the key components of the license:

Free for non-commercial use: Individuals and organizations can use the model free of charge for non-commercial use, including scientific research.
Free for commercial use (up to $1M in annual revenue): Startups, small to medium-sized businesses, and creators can use the model for commercial purposes at no cost, as long as their total annual revenue is less than $1M.
Ownership of outputs: Retain ownership of the media generated without restrictive licensing implications.

For organizations with annual revenue more than $1M, please contact us here to inquire about an Enterprise License.

More ways to access the models

While the model weights are available on Hugging Face now for self-hosting, you can also access the model through the following platforms:

Stability AI API
Replicate
DeepInfra
ComfyUI

Our commitment to safety

We believe in safe, responsible AI practices and take deliberate measures to ensure Integrity starts at the early stages of development. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3.5 by bad actors. For more information about our approach to Safety please visit our Stable Safety page.

Coming soon

We will also launch ControlNets soon, providing advanced control features for a wide variety of professional use cases.

We look forward to hearing your feedback on Stable Diffusion 3.5 and seeing what you create with the models. You can share thoughts directly with us through this form.

To stay updated on our progress follow us on X, LinkedIn, Instagram, and join our Discord Community.

▌内容来源

[1] Introducing Stable Diffusion 3.5 原文链接： https://stability.ai/news/introducing-stable-diffusion-3-5

http://mp.weixin.qq.com/s?__biz=MzkwMzQ0MDIzMg==&mid=2247491757&idx=1&sn=6369b68aa8009b0f5e2f7b854ce775e5

跟锅头一起学AI

持续学习AI知识和使用技巧，思考如何用AI高效学习办公

【AI学习】免费，手把手教你如何用AI工具批量生成中式美学插花艺术照片？（附效果和保姆级教程）

【AI学习】5大篇幅，共88个典型AI实操创作案例分享，免费自学

【AI学习】如何用AI生成穿水墨山水旗袍和青花瓷旗袍的人像写真照片？（附效果和保姆级教程）

【AI学习】免费，快速、批量换脸，手把手教你如何用扣子制作人像写真智能体？（附效果和保姆级教程）

【AI学习】Stable Diffusion 3.5介绍原文+中文翻译

【AI学习】打造“人工智能+教育”标杆应用工程，详见《北京市教育领域人工智能应用工作方案》的通知

【AI学习】免费，手把手教你如何用AI工具创作有机会获得10万赞的离谱AI视频？（附效果和保姆级教程）

【AI学习】免费，手把手教你如何用AI工具批量生成商务职业形象照片？（附效果和保姆级教程）

搞钱是一件非常简单的事！

【AI学习】一文带你了解4个亲测可用的写实视频转动漫的AI工具？（附实测效果）

【AI学习】即梦AI解锁音乐生成功能，如何使用？效果如何？（附效果和保姆级教程）

【AI学习】AI在企业应用中的私有化部署架构设计

【AI学习】看到喜欢的图片，如何用AI工具（腾讯元宝+即梦）创作出类似效果图？（附效果和保姆级教程）

【AI学习】海螺AI实测&如何创作人或动物从画中走出来特效视频？（附效果和保姆级教程）

【AI学习】如何用AI工具作出万物皆可充气、爆炸、融化等特效？（附效果和保姆级教程）

【AI测评】国内外7个AI生视频模型，谁能在汽车行驶过程中保持logo和文字不变形？即梦、海螺、Vidu等参评

【AI学习】如何用即梦AI+剪映创作汽车宣传短片？（附效果和保姆级教程）

【AI学习】如何用即梦AI+剪映创作十二星座星云图变幻效果视频合集？（附效果和保姆级教程）

【AI实测】被人号称可平替Sora的即梦AI生视频新模型Seaweed效果如何？

【AI学习】如何用人工智能深度赋能教育教学与科研创新实践？

【AI学习】如何用AI工具生成古代将军系列的人像写真照片？（附效果和保姆级教程）

【AI学习】如何用AI工具生成民族服饰系列的人像写真照片？（附效果和保姆级教程）

【AI学习】如何用AI工具生成古风仙女系列的人像写真照片？（附效果和保姆级教程）

【AI学习】如何用AI工具生成太空旅行系列的人像写真照片？（附效果和保姆级教程）

【AI学习】如何用AI工具生成不同年龄段的证件照？（附效果和保姆级教程）

【AI学习】如何用AI工具意境优美的实景倒影效果图片？（附效果和保姆级教程）

【AI学习】如何用AI工具创作好看的拍立得效果图片？（附效果和保姆级教程）

【AI学习】如何用AI工具快速批量生成自定义二维码及每日心语海报（进阶版）？（附效果和保姆级教程）

【AI学习】如何用AI工具创作好看的十二生肖动物梦幻水晶图片？（附效果和保姆级教程）

【AI学习】如何用AI工具快速批量生成带二维码的每日心语海报？（附效果和保姆级教程）

【AI学习】《锅头伴你学AI 100天》的88个创作案例，提供一对一伴学服务

【AI学习】如何用AI工具快速创作带二维码的节日海报？（附效果和保姆级教程）

【AI学习】《锅头伴你学AI 100天》的88个创作案例，提供一对一伴学服务

【AI学习】如何用AI工具快速完成儿童故事绘本和视频创作？（附效果和保姆级教程）

【每日AI提示词】如何快速创作一份黑白漫画作品？附Ideogram、可灵、即梦等6个国内外AI生成效果

【AI学习】如何用AI工具创作质感十足的十二生肖动物红木雕刻图片？（附效果和保姆级教程）

【AI学习】如何用AI工具创作十二生肖动物时尚走秀图片和视频？（附效果和保姆级教程）

【AI学习】如何用AI工具创作玫瑰花盛开的视频片段？（附效果和保姆级教程）

【每日AI提示词】如何生成一朵美丽的红色玫瑰花？附FLUX.1、可灵、即梦等6个国内外AI生成效果

【AI学习】Open AI团队CEO Sam Altman发表《智能时代》原文+中文翻译

【AI学习】如何用AI工具（即梦）让一张人物图片生成演讲视频，要求口型要对得上？（附效果和保姆级教程）

【AI学习】《锅头伴你学AI 100天》的88个创作案例，提供一对一伴学服务

【AI学习】《锅头伴你学AI 100天》的88个创作案例

【AI学习】如何用腾讯元器AI智能体把微信公众号文章作为知识库回答用户消息，搭建自己的AI客服？（附效果和保姆级教程）

【AI学习】如何用通义生成视频？（附效果和保姆级教程）

【AI学习】可灵新上线的运动笔刷功能如何使用？实测效果好不好？（附保姆级教程）

【每日AI提示词】如何生成水果掉入水中瞬间摄影图片？附FLUX.1、可灵、即梦等6个国内外AI生成效果

【AI学习】如何用AI工具做中秋节日祝福海报？（附效果和保姆级教程）

【每日AI提示词】如何生成鲜花服饰模特人物图？附FLUX.1、可灵、即梦等7个国内外AI生成效果

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉