魔搭社区每周速递（1.5-1.18）

文摘 2025-01-19 22:51 浙江

🙋魔搭ModelScope本期社区进展：

📟3239个模型：MiniCPM-o 2.6、internlm3-8b-instruct、Valley-Eagle-7B、phi-4、麦橘超然、phi-4、memo、Qwen2.5-Math-PRM等；

📁711个数据集：squad、msmarco-distilbert-margin-mse-cls-dot-v2、coliee等；

🎨192个创新应用：AI 春节贺卡生成器、动态交互式文本冒险游戏DEMO、VITA1.5_demo、WebWalker、ACE++编辑生成模型、千问翻译大模型等；

📄 16篇内容：

通义千问团队开源全新的过程奖励模型PRM！
ModelScope魔搭25年1月版本发布月报
过年了，用魔搭+魔笔打造您的 AI 春节贺卡生成器！
MiniCPM-o 2.6：流式全模态，端到端，多模态端侧大模型来了！
基于Gradio的AI应用搭建实践课③：AI模型部署与推理：应用功能可无限拓展
InternLM3开源发布！4T数据达到18T效果，成本省75%，首度融合深度思考与对话能力！
Valley2，基于电商场景的多模态大模型
微软phi-4来啦！小模型之光，14B科学、代码等能力超70B模型效果！
基于Gradio的AI应用搭建实践课②：Gradio基础学习，应用UI界面可无限DIY
共学 | 2025年，更加有效地搭建Agent
Paper Reading | MEMO：记忆引导扩散模型实现生动的Talking Head生成
DashInfer-VLM，多模态SOTA推理性能，超vLLM！
10分钟掌握微调大模型改变自我认知，定制专属自己的聊天机器人
麦橘超然上线魔搭社区，免费生图和训练，文末返图有奖
使用 modelscope-studio 构建你的 Gradio 应用
TransferTOD：利用LLM解决TOD系统在域外场景槽位难以泛化的问题

精选模型

MiniCPM-o 2.6

MiniCPM-o 2.6 是 MiniCPM-o 系列的最新、性能最佳模型。该模型基于 SigLip-400M、Whisper-medium-300M、ChatTTS-200M 和 Qwen2.5-7B 构建，共 8B 参数，通过端到端方式训练和推理。相比 MiniCPM-V 2.6，该模型在性能上有了显著提升，并支持了实时语音对话和多模态流式交互的新功能。

模型链接：

https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6

示例代码：

安装依赖

!pip install vector-quantize-pytorch==1.18.5 !pip install vocos==0.1.0!pip install transformers==4.44.2

推理代码

import torchfrom PIL import Imagefrom modelscope import AutoModel, AutoTokenizer
# load omni model default, the default init_vision/init_audio/init_tts is True# if load vision-only model, please set init_audio=False and init_tts=False# if load audio-only model, please set init_vision=Falsemodel = AutoModel.from_pretrained(    'openbmb/MiniCPM-o-2_6',    trust_remote_code=True,    attn_implementation='sdpa', # sdpa or flash_attention_2    torch_dtype=torch.bfloat16,    init_vision=True,    init_audio=True,    init_tts=True)

model = model.eval().cuda()tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-o-2_6', trust_remote_code=True)
# In addition to vision-only mode, tts processor and vocos also needs to be initializedmodel.init_tts()model.tts.float()import mathimport numpy as npfrom PIL import Imagefrom moviepy.editor import VideoFileClipimport tempfileimport librosaimport soundfile as sf
def get_video_chunk_content(video_path, flatten=True):    video = VideoFileClip(video_path)    print('video_duration:', video.duration)
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_audio_file:        temp_audio_file_path = temp_audio_file.name        video.audio.write_audiofile(temp_audio_file_path, codec="pcm_s16le", fps=16000)        audio_np, sr = librosa.load(temp_audio_file_path, sr=16000, mono=True)    num_units = math.ceil(video.duration)
    # 1 frame + 1s audio chunk    contents= []    for i in range(num_units):        frame = video.get_frame(i+1)        image = Image.fromarray((frame).astype(np.uint8))        audio = audio_np[sr*i:sr*(i+1)]        if flatten:            contents.extend(["<unit>", image, audio])        else:            contents.append(["<unit>", image, audio])
    return contents
video_path="/mnt/workspace/video.mp4"sys_msg = model.get_sys_prompt(mode='omni', language='en')# if use voice clone prompt, please set ref_audio# ref_audio_path = '/path/to/ref_audio'# ref_audio, _ = librosa.load(ref_audio_path, sr=16000, mono=True)# sys_msg = model.get_sys_prompt(ref_audio=ref_audio, mode='omni', language='en')
contents = get_video_chunk_content(video_path)msg = {"role":"user", "content": contents}msgs = [sys_msg, msg]
# please set generate_audio=True and output_audio_path to save the tts resultgenerate_audio = Trueoutput_audio_path = '/mnt/workspace/4.wav'
res = model.chat(    msgs=msgs,    tokenizer=tokenizer,    sampling=True,    temperature=0.5,    max_new_tokens=4096,    omni_input=True, # please set omni_input=True when omni inference    use_tts_template=True,    generate_audio=generate_audio,    output_audio_path=output_audio_path,    max_slice_nums=1,    use_image_id=False,    return_dict=True)print(res)

internlm3-8b-instruct

InternLM3 是上海人工智能实验室对书生大模型的重要升级版本，通过精炼数据框架大幅提升了数据效率与思维密度。仅需4T训练数据的InternLM3-8B-Instruct，其综合性能超越同量级开源模型，达到主流模型18T训练效果，节省75%以上的训练成本。该模型首次在通用模型中实现了常规对话与深度思考能力的融合，极大扩展了真实应用场景的应对能力。

InternLM3采用“通专融合”路径，结合大规模数据精炼框架，提高了训练数据质量，引入“思维密度”概念以提升模型性能，并为Scaling Law研究提供新范式。它还构建了合成数据探索方案，基于世界知识树进行指令标注和多智能体生成高质量回复，创建了数十万条微调指令数据集，优化了对话体验。评测显示，InternLM3在多个权威评测集中表现优异，接近GPT-4o-mini的综合性能。

模型链接：

https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct

示例代码：

使用transformers推理模型：

import torchfrom modelscope import AutoTokenizer, AutoModelForCausalLM
model_dir = "Shanghai_AI_Laboratory/internlm3-8b-instruct"#model = AutoModelForCausalLM(model_dir, trust_remote_code=True)tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.float16)# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.  # InternLM3 8B in 4bit will cost nearly 8GB GPU memory.  # pip install -U bitsandbytes  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)model = model.eval()model = model.cuda()

system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""messages = [    {"role": "system", "content": system_prompt},    {"role": "user", "content": "Please tell me five scenic spots in Shanghai"}, ]tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").cuda()
generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=1, repetition_penalty=1.005, top_k=40, top_p=0.8)
generated_ids = [    output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)].cuda()response = tokenizer.batch_decode(generated_ids)[0]print(response)

麦橘超然

麦橘超然是麦橘制作的基于Flux.1的模型，可以生成高度摄影写实和富有光影感的图片，尤其擅长表现人物的脸部和肌肤细节。麦橘之前的作品麦橘写实是各大文生图开源站点最受欢迎的模型之一。

麦橘超然模型融合了多种模型架构，生成逼真人物摄影风格，能精细呈现头发、眼睛、雀斑等细节；光影处理出色，还原明暗对比，增强立体感与氛围，适合暗部和阴影场景。此外，与社区30多位创作者合作，发布超50个基于该模型训练的lora。

模型链接：

https://modelscope.cn/models/MAILAND/majicflus_v1

模型玩法：

魔搭为大家准备了Comfyui一键工具包，配置社区的麦橘超然工作流+魔搭社区免费notebook算力，享受独占式生图自由

Comfyui一键工具包链接：
https://modelscope.cn/models/AI-ModelScope/ComfyUI-MajicFlus

麦橘超然工作流链接：

https://modelscope.oss-cn-beijing.aliyuncs.com/resource/majicflus.json

!wget "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cloudflared-linux-amd64.deb"!dpkg -i cloudflared-linux-amd64.deb
!git clone https://www.modelscope.cn/AI-ModelScope/ComfyUI-MajicFlus.git
%cd /mnt/workspace/ComfyUI-MajicFlusimport subprocessimport threadingimport timeimport socketimport urllib.request
def iframe_thread(port):  while True:      time.sleep(0.5)      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)      result = sock.connect_ex(('127.0.0.1', port))      if result == 0:        break      sock.close()  print("\nComfyUI finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n")
  p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)  for line in p.stderr:    l = line.decode()    if "trycloudflare.com " in l:      print("This is the URL to access ComfyUI:", l[l.find("http"):], end='')    #print(l, end='')

threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start()
!python main.py --dont-print-server

phi-4

Phi-4是微软研究院最新推出的模型，适用于语言理解、生成、多语言支持和知识推理等任务.

模型链接：

https://modelscope.cn/models/LLM-Research/phi-4

用法：

输入格式

考虑到训练数据的性质，phi-4最适合使用以下聊天格式的提示：

<|im_start|>system<|im_sep|>You are a medieval knight and must provide explanations to modern people.<|im_end|><|im_start|>user<|im_sep|>How should I explain the Internet?<|im_end|><|im_start|>assistant<|im_sep|>

和transformers

import transformersfrom modelscope import snapshot_download
model_dir = snapshot_download("LLM-Research/phi-4")
pipeline = transformers.pipeline(    "text-generation",    model=model_dir,    model_kwargs={"torch_dtype": "auto"},    device_map="auto",)
messages = [    {"role": "system", "content": "You are a medieval knight and must provide explanations to modern people."},    {"role": "user", "content": "How should I explain the Internet?"},]
outputs = pipeline(messages, max_new_tokens=128)print(outputs[0]["generated_text"][-1])

memo

MEMO是由Skywork AI、南洋理工大学、新加坡国立大学研究团队发布的视频生成模型，能通过一张图片和音频生成逼真、表情自然流畅的人像视频，同步音频与口型，效果栩栩如生.

模型链接：

https://www.modelscope.cn/models/ltzheng/memo

安装

conda create -n memo python=3.10 -yconda activate memoconda install -c conda-forge ffmpeg -ypip install -e .

推理

python inference.py --config configs/inference.yaml --input_image <IMAGE_PATH> --input_audio <AUDIO_PATH> --output_dir <SAVE_PATH>

例如：

python inference.py --config configs/inference.yaml --input_image assets/examples/dicaprio.jpg --input_audio assets/examples/speech.wav --output_dir outputs

Valley2

Valley2 是一种新颖的多模态大型语言模型，旨在通过可扩展的视觉-语言设计增强各个领域的性能，并拓展电子商务和短视频场景的实际应用边界。Valley2 在电子商务和短视频领域中实现了最先进的性能。它引入了如大视觉词汇、卷积适配器（ConvAdapter）和Eagle模块等创新，提高了处理多样化真实世界输入的灵活性，同时增强了训练和推理效率。Valley2 采用了Qwen2.5作为其LLM主干，SigLIP-384作为视觉编码器，并结合MLP层和卷积进行高效的特征转换。

模型链接：

https://www.modelscope.cn/models/bytedance-research/Valley-Eagle-7B

示例代码：

模型推理

from valley_eagle_chat import ValleyEagleChatfrom modelscope import snapshot_downloadimport urllib.request
# 需要把模型文件中的config.json的eagle_vision_tower和mm_vision_tower改为本地路径
model_dir = snapshot_download("bytedance-research/Valley-Eagle-7B")!modelscope download --model=Qwen/Qwen2-VL-7B-Instruct --local_dir=./Qwen2-VL-7B-Instruct!modelscope download --model=AI-ModelScope/siglip-so400m-patch14-384 --local_dir=./siglip-so400m-patch14-384model = ValleyEagleChat(    model_path=model_dir,    padding_side = 'left',)
url = 'http://p16-goveng-va.ibyteimg.com/tos-maliva-i-wtmo38ne4c-us/4870400481414052507~tplv-wtmo38ne4c-jpeg.jpeg'
img = urllib.request.urlopen(url=url, timeout=5).read()
request = {    "chat_history": [        {'role': 'system', 'content': 'You are Valley, developed by ByteDance. Your are a helpfull Assistant.'},        {'role': 'user', 'content': 'Describe the given image.'},    ],    "images": [img],}
result = model(request)print(f"\n>>> Assistant:\n")print(result)

from valley_eagle_chat import ValleyEagleChatimport decordimport requestsimport numpy as npfrom torchvision import transforms
model = ValleyEagleChat(    model_path=model_dir,    padding_side = 'left',)
url = 'https://videos.pexels.com/video-files/29641276/12753127_1920_1080_25fps.mp4'video_file = './video.mp4'response = requests.get(url)if response.status_code == 200:    with open("video.mp4", "wb") as f:        f.write(response.content)else:    print("download error!")    exit(1)
video_reader = decord.VideoReader(video_file)decord.bridge.set_bridge("torch")video = video_reader.get_batch(    np.linspace(0,  len(video_reader) - 1, 8).astype(np.int_)).byte()print([transforms.ToPILImage()(image.permute(2, 0, 1)).convert("RGB") for image in video])
request = {    "chat_history": [        {'role': 'system', 'content': 'You are Valley, developed by ByteDance. Your are a helpfull Assistant.'},        {'role': 'user', 'content': 'Describe the given video.'},    ],    "images": [transforms.ToPILImage()(image.permute(2, 0, 1)).convert("RGB") for image in video],}result = model(request)print(f"\n>>> Assistant:\n")print(result)

数据集推荐

squad

SQuAD数据集是用于训练和评估自然语言处理模型的阅读理解数据集，包含超10万个人类标注的问题及答案.

数据集链接：

https://www.modelscope.cn/datasets/sentence-transformers/squad

msmarco-distilbert-margin-mse-cls-dot-v2

基于 MS MARCO 优化，适用于文本相似度计算和问答系统任务.

数据集链接：

https://www.modelscope.cn/datasets/sentence-transformers/msmarco-distilbert-margin-mse-cls-dot-v2

coliee

COLIEE数据集专为法律领域文本相似度和案例检索设计，助力法律文本分析和案例匹配任务.

数据集链接：

https://www.modelscope.cn/datasets/sentence-transformers/coliee

精选应用

WebWalker

WebWalker 是一个基于网页的对话式智能体，可以帮助您浏览网站和查找信息。

体验直达：

https://www.modelscope.cn/studios/iic/WebWalker

小程序：

ACE++编辑生成模型

提供三种生成能力：肖像ID保持生成、对象ID保持生成和本地控制生成，用户需根据任务场景选择并上传参考图像与编辑图像，本地编辑时可选择不同信息保持维度。

体验直达：

https://www.modelscope.cn/studios/iic/ACE-Plus

小程序：

VITA1.5_demo

VITA是一个交互式全模态大型语言模型，您可以与它直接进行视频+语音输入的互动交流。

体验直达：

https://www.modelscope.cn/studios/modelscope/VITA1.5_demo

小程序：

千问翻译大模型

Qwen-MT是一个机器翻译引擎，支持翻译并提供术语干预、上下文增强、译文润色、翻译记忆、流式翻译、领域提示等可控功能.

体验直达：

https://www.modelscope.cn/studios/yangbaosong/Qwen_Turbo_MT

小程序：

基于剧本的动态交互式文本游戏

一个交互式文本冒险游戏，支持分支剧情，根据你的选择故事会有所不同。

体验直达：https://www.modelscope.cn/studios/Zingiber/Script_based_dynamic_interactive_text_game

小程序：

春节贺卡生成器

应用使用魔搭 Paraformer-large 模型支持语音识别、阿里云百炼的 wanx-poster-generation-v1 模型支持文生图能力。通过使用阿里云多端低代码开发平台魔笔，可以提升项目工程化、生产级应用上线效率，为开发者提供了全栈开发和多端发布的 AI 应用能力。

体验直达：

https://www.modelscope.cn/studios/mobi/mobiposter

社区精选文章

👇点击关注ModelScope公众号获取

更多技术信息~

魔搭ModelScope社区

模型开源社区魔搭社区ModelScope官方账号

最新文章

Deepseek开源R1系列模型，纯RL助力推理能力大跃升！

OpenCSG开源SmolTalk Chinese数据集

VITA-1.5: 迈向GPT-4o级实时视频-语音交互

基于Gradio的AI应用搭建实践课④：前后端联调及应用发布：打通前后端的任督二脉，就是完整的AI应用！

OpenCSG开源最大中文合成数据集Chinese Cosmopedia

魔搭社区每周速递（1.5-1.18）

通义千问团队开源全新的过程奖励模型PRM！

ModelScope魔搭25年1月版本发布月报

过年了，用魔搭+魔笔打造您的 AI 春节贺卡生成器！

MiniCPM-o 2.6：流式全模态，端到端，多模态端侧大模型来了！

基于Gradio的AI应用搭建实践课③： AI模型部署与推理：应用功能可无限拓展

InternLM3开源发布！4T数据达到18T效果，成本省75%，首度融合深度思考与对话能力！

直播预告 | NeurIPS 2024 评测基准论文专场

Valley2，基于电商场景的多模态大模型

微软phi-4来啦！小模型之光，14B科学、代码等能力超70B模型效果！

基于Gradio的AI应用搭建实践课②： Gradio基础学习，应用UI界面可无限DIY

共学 | 2025年，更加有效地搭建Agent

Paper Reading | MEMO：记忆引导扩散模型实现生动的Talking Head生成

DashInfer-VLM，多模态SOTA推理性能，超vLLM！

麦橘超然上线魔搭社区，免费生图和训练，文末返图有奖

使用 modelscope-studio 构建你的 Gradio 应用

TransferTOD：利用LLM解决TOD系统在域外场景槽位难以泛化的问题

魔搭社区每周速递（12.29-1.4）

AI电影从这个LoRA开始：魔搭AIGC1月赛题公布&12月赛题获奖作品新鲜出炉，快来围观风格化地标！

智谱发布GLM-OS概念及Agent产品，CogAgent-9B模型开源助力GUI交互场景

人人都是音乐家！中科大&科大讯飞重磅开源OpenMusic：音乐生成更高质量，更有乐感

新年课程开启：手把手教学，0基础5次课程学会搭建无限拓展的AI应用

2025的第一节启发课：从想法到实践（基于Gradio的AI应用搭建实践课①）

魔搭支持在阿里云人工智能平台PAI上进行模型训练、部署了！

社区供稿 | Para-Former：DUAT理论指导下的CV神经网络并行化，提速多层模型推理

极致的显存管理！6G显存运行混元Video模型

魔搭社区每周速递（12.22-12.28）

魔搭llamafile集成：让大模型开箱即用

ModernBERT-base：终于等到了 BERT 回归

Qwen开源视觉推理模型QVQ，更睿智地看世界！

MNN推理框架将大模型放进移动端设备，并达到SOTA推理性能！

HelloMeme:充分利用 SD1.5 基模的理解能力，实现表情与姿态的迁移

魔搭社区每周速递（12.15-12.21）

RWKV-7：极先进的大模型架构，长文本能力极强

多模态实时交互大模型浦语·灵笔 2.5 OmniLive开源：能看、能听、会记、会说！

AI赋能大学计划·大模型技术与应用实战学生训练营——湖南大学站圆满结营

CompassArena上新！JudgeCopilot与新一代Bradley-Terry模型重塑大模型竞技体验

社区供稿 | 引入隐式模型融合技术，中山大学团队推出 FuseChat-3.0

温暖接力：“追星星的AI”再出发，志愿者招募令！

Megrez-3B-Omni: 首个端侧全模态理解开源模型

ModelScope魔搭12月版本发布月报

CosyVoice再升级，可扩展流式语音合成

魔搭社区每周速递（12.08-12.14）

AI Safeguard联合 CMU，斯坦福提出端侧多模态小模型

CAMEL AI 上海黑客松重磅来袭！快来尝试搭建你的第一个多智能体系统吧！

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉