人工智能 | 搭建企业内部的大语言模型系统

文摘 2024-09-10 08:02 北京

大纲

开源大语言模型
大语言模型管理
私有大语言模型服务部署方案

开源大语言模型

担心安全与隐私？可私有部署的开源大模型

商业大模型，不支持私有部署

ChatGPT
Claude
Google Gemini
百度问心一言

开源大模型，支持私有部署

Mistral
Meta Llama
ChatGLM
阿里通义千问

常用开源大模型列表

开源大模型分支

大语言模型管理

大语言模型管理工具

HuggingFace 全面的大语言模型管理平台
Ollama 在本地管理大语言模型，下载速度超快
llama.cpp 在本地和云端的各种硬件上以最少的设置和最先进的性能实现 LLM 推理
GPT4All 一个免费使用、本地运行、具有隐私意识的聊天机器人。无需 GPU 或互联网

Ollama 速度最快的大语言模型管理工具

Ollama 的命令

ollama pull llama2ollama listollama run llama2 "Summarize this file: $(cat README.md)"
ollama serve
curl http://localhost:11434/api/generate -d '{  "model": "llama2",  "prompt":"Why is the sky blue?"}'curl http://localhost:11434/api/chat -d '{  "model": "mistral",  "messages": [    { "role": "user", "content": "why is the sky blue?" }  ]}'

大语言模型的前端

大语言模型的应用前端

开源平台 ollama-chatbot、PrivateGPT、gradio
开源服务 hugging face TGI、langchain-serve
开源框架 langchain llama-index

ollama chatbot

docker run -p 3000:3000 ghcr.io/ivanfioravanti/chatbot-ollama:main## http://localhost:3000

ollama chatbot

PrivateGPT

PrivateGPT 提供了一个 API，其中包含构建私有的、上下文感知的 AI 应用程序所需的所有构建块。该 API 遵循并扩展了 OpenAI API 标准，支持普通响应和流响应。这意味着，如果您可以在您的工具之一中使用 OpenAI API，则可以使用您自己的 PrivateGPT API，无需更改代码，并且如果您在本地模式下运行 privateGPT，则免费。

PrivateGPT 架构

FastAPI
LLamaIndex
支持本地 LLM，比如 ChatGLM llama Mistral
支持远程 LLM，比如 OpenAI Claud
支持嵌入 embeddings，比如 ollama embeddings-huggingface
支持向量存储，比如 Qdrant, ChromaDB and Postgres

PrivateGPT 环境准备

git clone https://github.com/imartinez/privateGPTcd privateGPT#不支持3.11之前的版本python3.11 -m venv .venvsource .venv/bin/activatepip install --upgrade pip poetry
#虽然官网只说了要安装少部分的依赖，但是那些依赖管理不是那么完善，容易有遗漏#所以我们的策略就是全都要。poetry install --extras "ui llms-llama-cpp llms-openai llms-openai-like llms-ollama llms-sagemaker llms-azopenai embeddings-ollama embeddings-huggingface embeddings-openai embeddings-sagemaker embeddings-azopenai vector-stores-qdrant vector-stores-chroma vector-stores-postgres storage-nodestore-postgres"
#或者用这个安装脚本#poetry install --extras "$(sed -n '/tool.poetry.extras/,/^$/p'  pyproject.toml | awk -F= 'NR>1{print $1}' | xargs)"

ollama 部署方式

ollama pull mistralollama pull nomic-embed-textollama serve
#官方这个依赖不够，还需要额外安装torch，所以尽量采用上面提到的全部安装的策略poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"PGPT_PROFILES=ollama poetry run python -m private_gpt

setting-ollama.yaml

server:  env_name: ${APP_ENV:ollama}
llm:  mode: ollama  max_new_tokens: 512  context_window: 3900  temperature: 0.1 #The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual. (Default: 0.1)
embedding:  mode: ollama
ollama:  llm_model: mistral  embedding_model: nomic-embed-text  api_base: http://localhost:11434  tfs_z: 1.0 ## Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.  top_k: 40 ## Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)  top_p: 0.9 ## Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)  repeat_last_n: 64 ## Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)  repeat_penalty: 1.2 ## Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
vectorstore:  database: qdrant
qdrant:  path: local_data/private_gpt/qdrant

启动

PGPT_PROFILES=ollama poetry run python -m private_gpt
poetry run python -m private_gpt02:36:06.928 [INFO    ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'ollama']02:36:46.567 [INFO    ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=ollama02:36:47.405 [INFO    ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=ollama02:36:47.414 [INFO    ] llama_index.core.indices.loading - Loading all indices.02:36:47.571 [INFO    ]         private_gpt.ui.ui - Mounting the gradio UI, at path=/02:36:47.620 [INFO    ]             uvicorn.error - Started server process [72677]02:36:47.620 [INFO    ]             uvicorn.error - Waiting for application startup.02:36:47.620 [INFO    ]             uvicorn.error - Application startup complete.02:36:47.620 [INFO    ]             uvicorn.error - Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)

PrivateGPT UI

local 部署模式

#todo: 需要安装llama-cpp，每个平台的安装方式都不同，参考官方文档
poetry run python scripts/setupPGPT_PROFILES=local poetry run python -m private_gpt

setting-local.yaml

server:  env_name: ${APP_ENV:local}
llm:  mode: llamacpp  ## Should be matching the selected model  max_new_tokens: 512  context_window: 3900  tokenizer: mistralai/Mistral-7B-Instruct-v0.2
llamacpp:  prompt_style: "mistral"  llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF  llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding:  mode: huggingface
huggingface:  embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore:  database: qdrant
qdrant:  path: local_data/private_gpt/qdrant

非私有 OpenAI-powered 部署

poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"PGPT_PROFILES=openai poetry run python -m private_gpt

setting-openai.yaml

server:  env_name: ${APP_ENV:openai}
llm:  mode: openai
embedding:  mode: openai
openai:  api_key: ${OPENAI_API_KEY:}  model: gpt-3.5-turbo

openai 风格的 API 调用

The API is built using FastAPI and follows OpenAI's API scheme.
The RAG pipeline is based on LlamaIndex.

curl -X POST http://localhost:8000/v1/completions \     -H "Content-Type: application/json" \     -d '{  "prompt": "string",  "stream": true
}'

推荐学习

人工智能测试开发训练营，为大家提供全方位的人工智能测试知识和技能培训。行业专家授课，实战驱动，并提供人工智能答疑福利。内容包含ChatGPT与私有大语言模型的多种应用，人工智能应用开发框架 LangChain，视觉与图像识别自动化测试，人工智能产品质量保障与测试，知识图谱与模型驱动测试，深度学习应用，带你一站式掌握人工智能测试开发必备核心技能，快速提升核心竞争力！

http://mp.weixin.qq.com/s?__biz=MzU3NDM4ODEzMg==&mid=2247533681&idx=1&sn=1fe24626fb3a2d38808503ff41939c2f

霍格沃兹测试学院

霍格沃兹测试学院致力于培养专业的测试人才，推动测试行业的技术更新和发展，我们本着此宗旨，为您提供测试技术培训和实战，让您真正的可以在测试能力上获得提升。

最新文章

名企测试管理大咖解析沟通管理，多维度经验分享

人工智能 | 检索增强生成(RAG)

人工智能 | ReACT 推理模式

精准测试如何落地

限时优惠进行中 | AI自动化班，全面解锁AI自动化技能

人工智能 | 文生视频大模型

11.11 大促，测试人什么值得买？

测试人生 | 90后斩获多家名企offer的小哥哥，做对了什么？

人工智能 | 文生图大模型

公开课 | Playwright：掌握Web自动化测试的新利器

第二届全国高校软件测试开发教育峰会在韩山师范学院隆重举办！

测试开发岗位就业与内推指导公开课

测试外包服务 | 从人员外包到测试工具、测试平台，提供全方位的测试解决方案~

大咖公开课 | 解锁Kafka等消息队列中间件的测试之道

人工智能 | 语音识别模型

人工智能 | 智谱 AI 大模型

测试人生 | 被裁员后人生低谷到绝处逢生，薪资怒涨近40%

【限时免费试听】高薪测试开发私教班，小班私教，抢鲜体验！

公开课 | AI赋能自动化测试：解锁未来测试新篇章

测试热招职位技能要求拆解公开课 —— 开启你的软件测试进阶之路

小班私教本周日，开放深圳线下试听，座位有限，先到先得

公开课 | AI赋能自动化测试：解锁未来测试新篇章

测试热招职位技能要求拆解公开课 —— 开启你的软件测试进阶之路

人工智能 | 阿里通义千问大模型

测试人生 | 双非院校，2年工作经验年薪近20万

人工智能 | mixtral大模型

人工智能 | openai chatgpt 大语言模型

公开课 | 金九银十，测试开发面试秘籍大公开！

免费试听 | 深圳测试开发高薪线下周末班即将开班，从自动化到测试平台开发，职场进阶快人一步

基于 LangChain 的自动化测试用例的生成与执行

人工智能 | 手工测试用例转Web自动化测试生成

公开课 | AI赋能自动化测试：解锁未来测试新篇章

马上开营 | 人工智能测试开发训练营带你解锁AI测试新技能，跃升职场新高度！

人工智能 | 手工测试用例生成

岗位内推 | 京东、快手、美团、百度、淘天集团、联想招人啦~

性能测试 | JMeter的运行

【紧急召集】大咖领衔，2天AI创业创收训练营即刻启程！不要错过，速来占位！

基于LangChain手工测试用例转App自动化测试生成工具

人工智能 | 基于ChatGPT开发人工智能服务平台

测试人生 | 手工转测试开发轻松实现薪资 50%涨幅的逆袭之路

人工智能 | ChatGPT 插件开发

性能测试 | JMeter 介绍与安装

公开课 | 金九银十，测试开发面试秘籍大公开！

人工智能 | 搭建企业内部的大语言模型系统

人工智能 | Hugging Face 的应用

公开课 | 测试工程师的质量体系构建指南

人工智能 | 大语言模型应用框架介绍

人工智能 | Mistral 大语言模型

人工智能 | MetaLlama大模型

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉