在个人电脑一键运行谷歌最新 Gemma-2-9B 大模型

文摘 2024-07-10 10:00 中国香港

谷歌最近发布了9B和27B大小的 Gemma 2模型^[1]，这是其 Gemma 模型系列的最新型号。根据其技术报告，未来几天将开源一个 Gemma-2-2b 模型。技术报告还显示，Gemma-2-9B模型在多个基准测试中的表现超过了 Mistral-7B、Llama-3-8B和 Gemma 1.5模型。

如果想一键在你的计算机上运行 Gemma-9b-Chat，可以在终端中运行以下命令
bash <(curl -sSfL 'https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh') —model gemma-2-9b-it

本文将以 Gemma-2-9B 为例，手把手教你轻松

在自己的设备上运行 Gemma-2-9B on your own device
为 Gemma-2-9B 创建一个 API server

我们将使用 LlamaEdge^[2] （Rust + Wasm技术栈）开发并部署该模型的应用。无需安装复杂的 Python 包或 C++ 工具链！了解我们为什么选择 Rust + Wasm^[3]。

在你自己的设备上运行 Gemma-2-9B

第一步：用下面的命令行安装 WasmEdge^[4].

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.13.5

第二步：下载 Gemma-2-9B-it model GGUF 模型^[5]文件。模型有6.40 GB，所以下载可能需要一定时间

curl -LO https://huggingface.co/second-state/gemma-2-9b-it-GGUF/resolve/main/gemma-2-9b-it-Q5_K_M.gguf

第三步：下载一个跨平台的可移植 Wasm 文件，是个聊天应用。该应用让你能在命令行中与模型聊天。该应用的 Rust 源代码戳这里^[6]。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

就是这样。可以通过输入以下命令在终端与模型聊天。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2-9b-it-Q5_K_M.gguf llama-chat.wasm -p gemma-instruct -c 4096

这个可移植的Wasm应用会自动利用设备上的硬件加速器（例如GPU）。

[You]:
Tell me what is gemma?

[Bot]:
Gemma is an open-weights AI assistant developed by Google DeepMind.

Here are some key things to know about Gemma:

* **Open Weights:** This means the underlying code and model weights are freely accessible to the public. Anyone can inspect, modify, or build upon Gemma.
* **Text-Only:** Gemma is designed primarily for text-based interactions. It can understand and generate human-like text but doesn't have capabilities like image recognition or voice synthesis.
* **Focus on Research & Development:** Gemma is intended as a tool for researchers and developers to explore and experiment with different AI models and techniques.

**Why is Gemma significant?**

* **Transparency and Collaboration:** Open-weights nature promotes transparency in AI development and encourages collaboration among researchers worldwide.
* **Customization:** Users can adapt Gemma to their specific needs by fine-tuning it on their own datasets.
* **Accessibility:**  Makes advanced AI technology accessible to a wider range of individuals and organizations, beyond those with large resources.

**Keep in mind:**

* Gemma is still under development and may have limitations compared to more mature, closed-source models.
* Its capabilities are primarily focused on text processing.

为Gemma-2-9b-it^[7] 创建一个兼容OpenAI的 API server

一个兼容 OpenAI 的API 使得 Llama-3-8B-Chinese 能够与不同的开发框架和工具无缝集成，比如 flows.network^[8], LangChain and LlamaIndex等等，提供更广泛的应用可能。大家也可以参考其代码自己写自己的API服务器或者其它大模型应用。想要启动 API 服务，请按以下步骤操作：下载这个 API 服务器应用。它是一个跨平台的可移植 Wasm 应用，可以在各种 CPU 和 GPU 设备上运行。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

然后，下载聊天机器人 Web UI，从而通过聊天机器人 UI 与模型进行交互。

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

接下来，使用以下命令行启动模型的 API 服务器。然后，打开浏览器访问 http://localhost:8080^[9] 开始聊天！

wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2b-it-Q5_K_M.gguf llama-api-server.wasm -p gemma-instruct -c 4096

另外打开一个终端窗口，可以使用 curl 与 API 服务器进行交互。

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are a sentient, superintelligent artificial general intelligence, here to teach and assist me."}, {"role":"user", "content": "Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world."}], "model":"Gemma-2b-it"}'

就是这样啦。WasmEdge 是运行 LLM 应用最简单、最快、最安全的方式^[10]。快来试试看吧！

参考资料

[1]

9B和27B大小的Gemma 2模型: https://ai.google.dev/gemma/docs

[2]

Image: image.png]我们将使用 [LlamaEdge: https://github.com/second-state/LlamaEdge/

[3]

了解我们为什么选择 Rust + Wasm: https://www.secondstate.io/articles/fast-llm-inference/

[4]

WasmEdge: https://github.com/WasmEdge/WasmEdge

[5]

Gemma-2-9B-it model GGUF 模型: https://huggingface.co/second-state/gemma-2-9b-it-GGUF

[6]

这里: https://github.com/second-state/llama-utils/tree/main/chat

[7]

Gemma-2-9b-it: https://www.secondstate.io/articles/gemma-2-9b/#create-an-openai-compatible-api-service-for-gemma-2-9b-it

[8]

flows.network: https://flows.network/

[9]

http://localhost:8080: http://localhost:8080/

[10]

运行 LLM 应用最简单、最快、最安全的方式: https://www.secondstate.io/articles/fast-llm-inference/

关于 WasmEdge

WasmEdge 是轻量级、安全、高性能、可扩展、兼容OCI的软件容器与运行环境。目前是 CNCF 沙箱项目。WasmEdge 被应用在 SaaS、云原生，service mesh、边缘计算、边缘云、微服务、流数据处理、LLM 推理等领域。

GitHub：https://github.com/WasmEdge/WasmEdge

官网：https://wasmedge.org/

‍‍Discord 群：https://discord.gg/U4B5sFTkFc

文档：https://wasmedge.org/docs

文章转载自Second State。点击这里阅读原文了解更多。

联系Linux Foundation APAC

Linux基金会是非营利性组织，是技术生态系统的重要组成部分。

Linux基金会通过提供财务和智力资源、基础设施、服务、活动以及培训来支持创建永续开源生态系统。在共享技术的创建中，Linux基金会及其项目通过共同努力形成了非凡成功的投资。请关注LFAPAC（Linux Foundation APAC）微信公众号。

http://mp.weixin.qq.com/s?__biz=MzUwOTg5NTA0Ng==&mid=2247517640&idx=2&sn=141f5de246692ba6af989270729945e3

LFAPAC

Linux基金会通过提供财务和智力资源、基础设施、服务、活动以及培训来支持创建永续开源生态系统。在共享技术的创建中，Linux基金会及其项目通过共同努力形成了非凡成功的投资。

最新文章

开源的未来：研讨会亮点和行动号召，激发全球可持续发展的进步

Fluent Bit: 转换 GBK 日志

从 CrowdStrike 的错误更新中吸取的教训：健壮的发布流程至关重要

使用 Shipwright 构建安全的容器镜像

容器干扰检测与治理（上篇）

CNCF 2024 云原生可持续发展周

LF AI & Data中国6月回顾：展示创新和扩大合作

使用Kubescape和Copa修复镜像漏洞

LFAPAC Weekly 24-29

在联合国的开源力量：关于OSPOs For Good研讨会和“下一步？”工作坊的反思

云原生技术深耕者：朱佳庆的Kubestronaut成长故事

将Atlantis与OpenTofu集成

LLM On Kubernetes

Operating v0.5.0 发布啦！

使用Radius和Dapr构建云无关应用程序

KubeEdge 遇上 Cilium！！！

CNCF一个全新的应用开发工作组现已启动！

基于etcd/RawNode的Multi-Raft设计与实现

针对内部开发者平台的全面提案——平台工程++

不要忽视闲置功率: Kepler的公有云能耗利用率指标

保障云安全：CloudNativeSecurityCon 2024洞见分享

机密计算的挑战和回报

LFAPAC Weekly 24-28

2024年中回顾CNCF、Linux基金会以及排名前30的开源项目的发展速度

KCL 测试框架设计

Score 加入成为 CNCF 沙箱项目

异步运行时IO问题分析

openGemini 晋级CNCF官方项目，技术发展迎来新篇章！

Kubestronaut 是如何炼成的 - Khushboo Khatter 的云原生之路

将 oVirt 虚拟机迁移到 KubeVirt

领先市场参与者携手Linux基金会成立Global Synchronizer基金会

在个人电脑一键运行谷歌最新 Gemma-2-9B 大模型

KCL v0.9.0 重磅发布 - 小体积，高性能，更丰富的生态集成

Linux基金会：布局 AI 原生未来，打造工作能力优势

Databricks加入LF AI & Data基金会，成为高级会员

随着LF Edge通过4个新项目扩大技术范围，开放边缘计算达到临界质量

Linux基金会最新SkillCred证书 Regex 上线！

LFAPAC Weekly 24-26

2024 OSPO Survey

Linux基金会新报告揭示开源如何驱动垂直行业技术变革

赞助 2024 中国 KubeCon + CloudNativeCon + 开源峰会 + AI_dev

Koordinator v1.5: 持续优化，进入 CNCF Sandbox

KubeCon 中国：移动云与云猿生联合议题《在没有专用 Operator 的情况下管理数据库集群》

Delta Lake加入LF AI & Data基金会：推进开源人工智能和数据的创新

详解微服务应用灰度发布最佳实践

Argo Rollouts 现已支持 K8s Gateway API 1.0 版本

快来查收 KCL 最新动态 | 获取最新的模型库、工具链与 IDE 更新

推动AI革命：PyTorch纪录片

Linux基金会宣布将成立LF Decentralized Trust

冰火交融：蚂蚁集团通过平台工程加速互联网和金融服务创新

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉

在个人电脑一键运行谷歌最新 Gemma-2-9B 大模型

为Gemma-2-9b-it[7] 创建一个兼容OpenAI的 API server

为Gemma-2-9b-it^[7] 创建一个兼容OpenAI的 API server