【AIGC 学习】Bark Text-To-Speech(2) 生产长音频

文摘科技 2023-05-23 22:56 香港

我之前分享过关于bark的内容【AIGC 学习】Bark Text-To-Speech，但这个工具最初只能生成不超过13秒的音频，不过他们在上个月更新了新的版本，专为长音频生产设计。

https://github.com/suno-ai/bark/blob/main/notebooks/long_form_generation.ipynb

在开始之前，我们还需要先下载相关环境。

#@title 安装环境 - 无论生产什么音频都需要运行
! pip install git+https://github.com/suno-ai/bark.git

from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio

import os
import nltk
nltk.download('punkt')

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import nltk  # we'll use this to split into sentences
import numpy as np

from bark.generation import (
    generate_text_semantic,
    preload_models,
)
from bark.api import semantic_to_waveform

preload_models()

我尝试了一些长篇幅的例子：

#@title 生产长音频

speaker = "v2/en_speaker_6"

script = """
Hey, have you heard about this new text-to-audio model called "Bark"? 
Apparently, it's the most realistic and natural-sounding text-to-audio model 
out there right now. People are saying it sounds just like a real person speaking. 
I think it uses advanced machine learning algorithms to analyze and understand the 
nuances of human speech, and then replicates those nuances in its own speech output. 
It's pretty impressive, and I bet it could be used for things like audiobooks or podcasts. 
In fact, I heard that some publishers are already starting to use Bark to create audiobooks. 
It would be like having your own personal voiceover artist. I really think Bark is going to 
be a game-changer in the world of text-to-audio technology! [end]
""".replace("\n", " ").strip()

sentences = nltk.sent_tokenize(script)

GEN_TEMP = 0.6

silence = np.zeros(int(0.1 * SAMPLE_RATE)) 

pieces = []
for sentence in sentences:
    semantic_tokens = generate_text_semantic(
        sentence,
        history_prompt=speaker,
        temp=GEN_TEMP,
        min_eos_p=0.05, 
    )

    audio_array = semantic_to_waveform(semantic_tokens, history_prompt=speaker,)
    pieces += [audio_array, silence.copy()]

Audio(np.concatenate(pieces), rate=SAMPLE_RATE)

这是音频效果：

我们也可以生成长对话：

#@title 生产长对话

speaker_lookup = {"Samantha": "v2/en_speaker_9", "John": "v2/en_speaker_6"}

script = """
Samantha: Hey, have you heard about this new text-to-audio model called "Bark"?
John: No, I haven't. What's so special about it?
Samantha: Well, apparently it's the most realistic and natural-sounding text-to-audio model out there right now. People are saying it sounds just like a real person speaking.
John: Wow, that sounds amazing. How does it work?
Samantha: I think it uses advanced machine learning algorithms to analyze and understand the nuances of human speech, and then replicates those nuances in its own speech output.
John: That's pretty impressive. Do you think it could be used for things like audiobooks or podcasts?
Samantha: Definitely! In fact, I heard that some publishers are already starting to use Bark to create audiobooks. And I bet it would be great for podcasts too.
John: I can imagine. It would be like having your own personal voiceover artist.
Samantha: Exactly! I think Bark is going to be a game-changer in the world of text-to-audio technology."""
script = script.strip().split("\n")
script = [s.strip() for s in script if s]
script

pieces = []
silence = np.zeros(int(0.1*SAMPLE_RATE))
for line in script:
    speaker, text = line.split(": ")
    audio_array = generate_audio(text, history_prompt=speaker_lookup[speaker], )
    pieces += [audio_array, silence.copy()]

Audio(np.concatenate(pieces), rate=SAMPLE_RATE)

这是音频效果：

支持的音效：

[laughter]，[laughs]，[sighs]，[music]，[gasps]，[clears throat]
— or ... for hesitations
♪ for song lyrics
CAPITALIZATION for emphasis of a word 大写字母强调
[MAN] and [WOMAN] 男生和女生

也可以修改的语言和声音：

都效果喜人，而且还是 MIT license，是商业使用友好的哟～

http://mp.weixin.qq.com/s?__biz=MzkwOTMzMzk0MQ==&mid=2247485268&idx=1&sn=80c5f35ebaaf5923f37b3ac94c6a5006

Renee 创业随笔

絮絮叨叨

最新文章

【Google 的最新 Paper】生命有可能是由智能生物创造的？！

IMAGDressing

SMooDi - AI 生成逼真且风格化的人物动作

阿里的EchoMimic - 生成肖像视频

阿里的语义识别模型SenseVoice和语音生成模型CosyVoice

Google的Still-Moving：通过少量的静态参考图像生成个性化的视频内容

Google 内部工具 Smart Paste - 通过自动调整粘贴的代码来简化代码编写工作流程

Google 的Magic Insert 通过拖入到目标图片实现风格感知且逼真的插入效果

Google DeepMind 的Video-to-audio research - 为视频配音

Dify - LLM 应用开发平台

Scenario 游戏素材 GAI 试用

threestudio 3D 模型生成试用

Google Search Labs 试用

创业中的爬山（Hill Climbing）算法

Chat.ALL 使用笔记

使用 SadTalker 生成数字人视频

The Meta-Prompts: Guiding GPT to Generate its own Prompts

训练自己的声音：SoftVC VITS Singing Voice Conversion Fork

【AIGC 学习】Bark Text-To-Speech(2) 生产长音频

Shap-E 3D 生成

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉