Ollama 与OpenAI API 兼容性

文摘 2024-11-17 08:33 湖北

1. 简介

—

Ollama现在提供了与OpenAI API的完全兼容性，这意味着你可以：

使用熟悉的OpenAI接口
在本地运行大语言模型
无缝迁移现有OpenAI应用
享受更低的延迟和更好的隐私保护

1.1 支持的功能

Chat completions
流式输出
JSON模式
可重现输出
视觉功能
函数调用
Logprobs计算

1.2 环境准备

# 1. 安装Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. 拉取模型
ollama pull llama3

# 3. 验证安装
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

2. 兼容特性

—

2.1 请求参数对照表

{
  "model": "string",           // 模型名称
  "messages": [               // 消息数组
    {
      "role": "string",       // 角色：system/user/assistant
      "content": "string"     // 消息内容
    }
  ],
  "temperature": float,       // 温度参数
  "top_p": float,            // 采样参数
  "stream": boolean,         // 是否流式输出
  "stop": [string],         // 停止词
  "max_tokens": integer,    // 最大token数
  "presence_penalty": float, // 存在惩罚
  "frequency_penalty": float // 频率惩罚
}

2.2 模型命名映射

# 为了兼容依赖默认OpenAI模型名称的工具
ollama cp llama3 gpt-3.5-turbo
ollama cp mistral gpt-4

3. 快速开始

—

3.1 Python示例

from openai import OpenAI

class OllamaClient:
    def __init__(self):
        self.client = OpenAI(
            base_url='http://localhost:11434/v1',
            api_key='ollama'  # 必须但不使用
        )
    
    def chat(self, messages):
        try:
            response = self.client.chat.completions.create(
                model="llama3",
                messages=messages
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"Error: {e}")
            return None
    
    def stream_chat(self, messages):
        response = self.client.chat.completions.create(
            model="llama3",
            messages=messages,
            stream=True
        )
        for chunk in response:
            if chunk.choices[0].delta.content:
                yield chunk.choices[0].delta.content

3.2 JavaScript示例

import OpenAI from 'openai';

class OllamaService {
    constructor() {
        this.client = new OpenAI({
            baseURL: 'http://localhost:11434/v1',
            apiKey: 'ollama'
        });
    }

    async chat(messages) {
        try {
            const completion = await this.client.chat.completions.create({
                model: 'llama3',
                messages: messages
            });
            return completion.choices[0].message.content;
        } catch (error) {
            console.error('Chat error:', error);
            throw error;
        }
    }
}

4. 多语言SDK支持

—

4.1 Python SDK使用

# 完整对话示例
client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

conversation = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "我需要一个Python函数来计算斐波那契数列"},
    {"role": "assistant", "content": "我会帮你写一个高效的实现"},
    {"role": "user", "content": "请使用递归和缓存"}
]

response = client.chat.completions.create(
    model="llama3",
    messages=conversation
)

4.2 Node.js SDK使用

// Express服务器集成示例
import express from 'express';
import OpenAI from 'openai';

const app = express();
const openai = new OpenAI({
    baseURL: 'http://localhost:11434/v1',
    apiKey: 'ollama'
});

app.post('/chat', async (req, res) => {
    try {
        const completion = await openai.chat.completions.create({
            model: 'llama3',
            messages: req.body.messages,
            stream: true
        });

        res.setHeader('Content-Type', 'text/event-stream');
        
        for await (const chunk of completion) {
            const content = chunk.choices[0].delta.content;
            if (content) {
                res.write(`data: ${JSON.stringify({content})}\n\n`);
            }
        }
        res.end();
    } catch (error) {
        res.status(500).json({error: error.message});
    }
});

5. 高级应用

—

5.1 函数调用

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的天气信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "城市名称"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="llama3",
    messages=[{"role": "user", "content": "北京今天天气怎么样？"}],
    tools=tools
)

5.2 JSON模式输出

response = client.chat.completions.create(
    model="llama3",
    messages=[{
        "role": "user",
        "content": "用JSON格式描述一本书的信息"
    }],
    response_format={"type": "json_object"}
)

6. 最佳实践

—

6.1 错误处理

def safe_chat_request(client, messages, retries=3):
    for i in range(retries):
        try:
            response = client.chat.completions.create(
                model="llama3",
                messages=messages
            )
            return response
        except Exception as e:
            if i == retries - 1:
                raise e
            time.sleep(1 * (i + 1))

6.2 长文本处理

def chunk_text(text, max_tokens=2000):
    """将长文本分割成多个小块"""
    sentences = text.split('. ')
    chunks = []
    current_chunk = []
    current_length = 0
    
    for sentence in sentences:
        sentence_length = len(sentence.split())
        if current_length + sentence_length > max_tokens:
            chunks.append('. '.join(current_chunk) + '.')
            current_chunk = [sentence]
            current_length = sentence_length
        else:
            current_chunk.append(sentence)
            current_length += sentence_length
            
    if current_chunk:
        chunks.append('. '.join(current_chunk) + '.')
    return chunks

7. 常见框架集成

—

7.1 Vercel AI SDK集成

// app/api/chat/route.ts
import { OpenAIStream, StreamingTextResponse } from 'ai'
import OpenAI from 'openai'

export const runtime = 'edge'

const openai = new OpenAI({
    baseURL: 'http://localhost:11434/v1',
    apiKey: 'ollama'
})

export async function POST(req: Request) {
    const { messages } = await req.json()
    const response = await openai.chat.completions.create({
        model: 'llama3',
        stream: true,
        messages
    })
    const stream = OpenAIStream(response)
    return new StreamingTextResponse(stream)
}

7.2 AutoGen集成

from autogen import AssistantAgent, UserProxyAgent

config_list = [{
    "model": "codellama",
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama",
}]

assistant = AssistantAgent(
    name="coding_assistant",
    llm_config={"config_list": config_list}
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False
    }
)

user_proxy.initiate_chat(
    assistant,
    message="创建一个简单的Flask API服务器"
)

8. 问题排查

—

8.1 常见问题

连接问题

def check_ollama_connection():
    try:
        requests.get("http://localhost:11434/v1/health")
        return True
    except:
        return False

模型加载问题

def ensure_model_available(model_name):
    try:
        # 检查模型是否已下载
        result = subprocess.run(
            ['ollama', 'list'],
            capture_output=True,
            text=True
        )
        if model_name not in result.stdout:
            subprocess.run(['ollama', 'pull', model_name])
        return True
    except Exception as e:
        print(f"Error ensuring model availability: {e}")
        return False

8.2 性能优化

class OllamaOptimizer:
    def __init__(self):
        self.model_cache = set()
    
    def preload_models(self, models):
        """预加载常用模型"""
        for model in models:
            if model not in self.model_cache:
                subprocess.run(['ollama', 'pull', model])
                self.model_cache.add(model)
    
    def cleanup(self):
        """清理不需要的模型"""
        result = subprocess.run(
            ['ollama', 'list'],
            capture_output=True,
            text=True
        )
        # 实现清理逻辑

字节笔记本

专注于科技领域的分享，AIGC，全栈开发，产品运营

最新文章

使用shell脚本完成前端重复性操作

10分钟搞定图片视频存储！自建对象存储服务真香

AI Frontiers Digest：开源的智能化AI资讯聚合平台

薅Google图片搜索羊毛之API集成

格式被吞，补发: 如何优雅地约束LLM输出结构

如何优雅地约束LLM输出结构

Go 数据同步的缓冲区设计

Git合并那些事：理解merge、rebase和fast-forward的区别

MeiliSearch Curl API 总结

Go项目配置管理实战：优雅地实现多环境配置切换

Go配置管理利器：Viper使用指南

MySQL数据实时同步MeiliSearch方案

前端开发必备：这个图标管理神器让你告别重复搜索！

如何在终端快速定位历史命令？老司机都在用这些技巧！

让 AI 更听话的小程序来了！Prompt 集合上线啦！

将 Node.js 应用发布为 CLI 工具

放弃pip 拥抱Poetry

在 Cursor 中配置和使用本地的qwen2.5-coder

tmux 终端复用

Ollama 与OpenAI API 兼容性

使用SSH隧道解决微信公众号开发的IP白名单问题

Next.js 15 全栈开发系列之markdown静态博客开发

如何在后台运行耗时任务?

小程序的几个高度

一句JSON 文件配置生成完整的后台管理界面

Next.js 15 全栈开发系列之SWR请求

成为提示词 prompt 专家

如何调用百度的免费大语言模型ERNIE Speed-128K

Next.js 15 全栈开发系列之Docker 构建与优化

大语言模型显存占用表

Next.js 15 全栈开发系列之静态导出

Cursor 的最佳搭档 Qwen2.5-Coder

年付仅¥11.11起!超值KVM VPS套餐推荐

Next.js 15 全栈开发系列之国际化

Nginx SSE 完整配置说明

Next.js 15 全栈开发系列之内置 API 和工具

ChatGPT为什么突然变笨？教你检测AI助手是否被削弱

Next.js 15 全栈开发系列之配置选项

一个实时肖像编辑和动画的开源工具

使用rsync自动化部署前端打包文件

在Mac上的必备的VSCode快捷键

Next.js 15 全栈开发系列之Cloudflare Pages Edge Runtime

微信小程序自动版本号管理与发布

在任意 Mac 上开启和使用 Xcode LLM / Apple Intelligence

Next.js 15 全栈开发系列之 next/dynamic

放弃htop 拥抱NeoHtop

Vue3 封装一个基于element-plus的弹窗组件

Next.js 15 全栈开发系列之元数据Metadata

放弃postman 拥抱.http

Next.js 15 全栈开发系列之内置Link Script组件

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉