150 行代码，复刻「草莓」，青春版支持联网

2024-09-14 13:08 广东

前置一个有争议的个人观点：

o1 与其说是一个模型，不如说是一个自带任务规划和反思的 agent

这类的 agent 的最大优势，就是推理能力，以时间换性能，拿 token 换准确，有兴趣的同学可以读一下我之前写的一些内容：

《实用至上：智能体/Agent 是什么》：在这一篇里，我解释了 agent 的由来，以及探索路径
《OpenAI「草莓」今秋发布，随后是「猎户座」》：在这一篇里，我预测了 o1 的形态以及行为（agent based program)

我必然会认为 o1 很强，也很有用：在大模型进展缓慢的前提下，这种思路能有效提高模型的输出水平。对于最广大的 ai 用户来说，能有效提升模型使用效率。（更广大的用户，不用 ai）

但我也必然认为，拿 o1 去进行大模型参数比拼是极其不合适的，尤其是进行 0-shot 比较。

换一种说法： 拿一个反复检查 2 年半的试卷，和按时提交的试卷，去比准确率，很不合适。

在这篇文章里，我尝试用 150 行代码，构建一个能联网的、青春版的「草莓」

所谓青春版，是因为：

这里包含了最基础的项目规划和反思
没做任何微调，甚至没有用 openai 的模型，选了免费的智谱 glm-4-flash
我在里面加了 WebSearch，这样对于已知问题，可以更快的求解

注意：原版 o1 无法联网搜索，也无法使用任何的 tool

性能远没草莓好，没有内置 COT，仅作为 demo，用土方法模仿其功能

效果如下（回答 9.8 和 9.11 谁大）：

接下来，我将先展示代码，然后说一下实现原理。

代码展示

先说一下，这里我用的 colab，所以 api_key=userdata.get('Key_Zhipu')。

联网这里，我用的 WebPilot 的搜索 api，所以有一个 {watt(problem)}

这两个东西，你可以根据需求来改

from openai import OpenAIfrom dataclasses import dataclass, fieldfrom typing import List, Optionalfrom IPython.display import display, Markdownfrom google.colab import userdata
# Set your OpenAI API key securelyclient = OpenAI(    api_key=userdata.get('Key_Zhipu'),    base_url="https://open.bigmodel.cn/api/paas/v4/") model = "glm-4-flash"
# Define data models@dataclassclass ThoughtStep:    step_answer: str    is_completed: bool    hint: str
@dataclassclass ReasoningProcess:    initial_problem: str    steps: List[ThoughtStep] = field(default_factory=list)    final_answer: Optional[str] = None
def solve_problem(problem: str, max_attempts: int = 10) -> ReasoningProcess:    """    Solve a problem using multi-step reasoning, planning, and intelligent thinking.    """    reasoning_process = ReasoningProcess(initial_problem=problem)    attempts = 0    is_completed = False
    # Step 1: Analyze the problem and plan    analysis_prompt = f"""You are an AI assistant that excels at solving complex STEM problems using multi-step reasoning.When given a problem, first analyze it, think about possible solution methods, and plan the subsequent steps to solve it.
Problem:{problem}
Web Search:{watt(problem)}
Provide your analysis and step-by-step plan in plain text."""
    display(Markdown("**大聪明正在思考...**"))    messages = [{"role": "user", "content": analysis_prompt}]    response = client.chat.completions.create(        model=model,        messages=messages    ).choices[0].message.content.strip()
    # Display AI's initial analysis    display(Markdown(f"### AI Initial Analysis:\n{response}\n"))
    hint = response    analysis_step = ThoughtStep(step_answer="", is_completed=False, hint=hint)    reasoning_process.steps.append(analysis_step)
    messages = [{"role": "system", "content": "You are an AI assistant continuing the problem-solving process."},                {"role": "user", "content": "Giving a thought about this problem: " + problem},                {"role": "assistant", "content": hint},                {"role": "user", "content": f"Solve it with this thought, and give the final answer"}]
    # Continue with the plan and attempt to solve the problem    while not is_completed and attempts < max_attempts:        attempts += 1                # Phase 1: Generate the step answer based on the thought        response = client.chat.completions.create(            model=model,            messages=messages        ).choices[0].message.content.strip()
        # Extract step answer        step_answer = response.strip()        display(Markdown(f"### Step Answer (Attempt {attempts}):\n{step_answer}\n"))
        # Phase 2: Validate the step answer using XML format        validation_prompt = f"""You are an AI validator. Check if the following step answer solves the problem correctly:
Problem:{problem}
Step Answer:{step_answer}
Respond in XML format as follows:<response>    <is_correct>Is this answer 100% correct? Return true or false</is_correct>    <hint>If the answer is incorrect, provide a new thought or hint.</hint></response>"""
        display(Markdown(f"**AI is validating step answer (Attempt {attempts})...**"))        messages_validation = [{"role": "user", "content": validation_prompt}]        response = client.chat.completions.create(            model=model,            messages=messages_validation        ).choices[0].message.content.strip()
        # Parse the XML response        try:            is_correct = 'true' in response.lower()            hint_start = response.find('<hint>') + len('<hint>')            hint_end = response.find('</hint>')            hint = response[hint_start:hint_end].strip() if hint_start != -1 and hint_end != -1 else "No hint provided"        except:            is_correct = False            hint = "Error parsing validation response."
        # Update reasoning process        step = ThoughtStep(step_answer=step_answer, is_completed=is_correct, hint=hint)        reasoning_process.steps.append(step)
        messages += [{"role": "assistant", "content": step_answer}]
        if is_correct:            break  # Exit loop if the step answer is correct
        messages += [{"role": "user", "content": "Not correct, try with this: " + hint}]
    # Final answer step    messages += [{"role": "user", "content": f"Based on your reasoning, provide the final answer to the problem and return it in the same language as the following: {reasoning_process.initial_problem}"}]    response = client.chat.completions.create(        model=model,        messages=messages    ).choices[0].message.content.strip()
    # Extract the final answer    reasoning_process.final_answer = response
    # Display the final answer    display(Markdown(f"## Final Answer:\n{response}"))
    return reasoning_process
def display_reasoning_process(process: ReasoningProcess) -> None:    """    Display the reasoning process details.    """    display(Markdown(f"## Problem:\n{process.initial_problem}\n"))    for idx, step in enumerate(process.steps, 1):        display(Markdown(f"### Step {idx}:\n**Hint**: {step.hint}\n**Is Completed**: {step.is_completed}\n"))    if process.final_answer:        display(Markdown(f"## Final Answer:\n{process.final_answer}"))    else:        display(Markdown("## Final Answer: Not determined yet."))
# Example usageif __name__ == "__main__":    problem_text = """9.8 和 9.11 谁大"""
    # Solve the problem    reasoning = solve_problem(problem_text)

原理解读

首先，这里我用的是 glm-4-flash，原因无他：免费。

整个实现的流程分几步：

第一步：任务规划。这个 agent 会先上网查阅有关问题的材料，并结合用户给到的问题进行分析，输出这个问题的解答规划
第二步：任务尝试。在收到规划后，这个 agent 会对问题进行尝试解决：

如果解决掉了（或者超出最大重试次数），则跳到第三步；
如果没解决，则反思一下自己为啥没解决好，然后自己 PUA 自己，并重试

第二步：任务收束。总结上面的问题解答，输出正式答案

最终，对于问题「回答 9.8 和 9.11 谁大」，输出这个（包含思考过程）：

这类程序，其方法就是让 ai 反复 PUA 自己，或者在找一个 ai 来 PUA 干活的 ai，让他不断尝试、检查和改进，直到交工（是不是很熟悉）

说明了什么

从几个角度，我来说这件事：

o1 不神秘，你也可以做（青春版限定）
调成 o1 这个效果，还是得从多角度下功夫，无论是 agent 的工程化，还是对模型进行一些训练（cot 内化）
o1 会很有用，尤其是在合成数据，以及解决复杂任务这块
一定程度上，说明了模型本身训练遇到了一些瓶颈
prompt 工程会逐渐式微

以及，欢迎讨论下这个：《对于 AI & AGI，我有 3 个问题》

再以及，回头我来筹办个正式的「o1 算法挑战赛」，欢迎届时参加

（先让我去化缘点奖金，ahhhhhh

http://mp.weixin.qq.com/s?__biz=MzkzNDQxOTU2MQ==&mid=2247491584&idx=1&sn=503894952c1d69935176efde5998b634

赛博禅心

拜AI古佛，修赛博禅心

最新文章

参加完 OpenAI 的活动，我看到了「草莓」的隐患

o1 能带我们走进 AGI 吗？

150 行代码，复刻「草莓」，青春版支持联网

「草莓」实测：可能只是工程 Trick，且有扣费陷阱！

原理解析：李继刚老师的「汉语新解」

iPhone 16 发布，全面解读「苹果2024发布会」

小红书式爆款文案正在剿杀语文。

拒绝谣言：OpenAI 没说新模型提价

插播：Qwen 404，但不必担心

最后一天：OpenAI 开发者日，将截止确认

近乎免费的 Gemini Flash，有了结构化输出

突发！Runway HF 已删库跑路

智谱 GLM-4-Plus 发布，独家附送免费 API，和我整的新活

OpenAI「草莓」今秋发布，随后是「猎户座」

史诗更新！1080 可跑的 Sora，可商用！超大杯 CogVideoX 5B 开源！GLM-Flash 免费！

大厂山寨 Cursor，不如做好邮箱

征集｜ComfyUI 全球社区峰会 AI 展：新艺术宣言

你需要的不是智能体，而是工作流

谷歌前 CEO：「盗用内容也不是不行」

看完这篇，你也能做 AI 搜索：论「结构化输出」

系统性「造人」：论 AI 拟人的实现

OpenAI 开发者大会，现开放报名

全网首发：智谱「Sora」此刻开源，单卡可跑可调，附训练细节

历史新高：24Q2，美国近期 AI 项目融资，总计 122 亿美金

并非25亿收购：谷歌与 C.AI 交易细节

学外企员工，「中英夹杂」记 word

入口之战：AI 时代的「二维码」，在哪里？

ChatGPT Search 正开放内测，附申请地址

剑指 Meta：Mistral Large2 凌晨开源，媲美 Llama3.1

最大405B：Llama-3.1 发布，第一时间详解

我做了两个 JSON：涵盖各模型接口信息

GPT-4o迷你版发布，比 3.5 更便宜，但有计费 Bug

大模型真实速度一览（附：测试脚本）

对于 AI & AGI，我有 3 个问题

WAIC 的这几天，咱从业者们聚一聚？

大模型 API 文档一览：有的简洁易用，有的乱七八糟

全军覆没：国产大模型，都没做好 OpenAI 兼容

AI 画图正经入门：ComfyUI 的基础七课

MarsCode：AI 在线 IDE，很好用

OpenAI 收购 Multi，一款协作工具

第一批背靠 OpenAI 的公司，已经倒下了

Anthropic：Claude 3.5 发布，更快更强，还便宜

从 OpenAI 发布的 36 个实践，窥探真实的 AI 产业机会

Meta：悄悄发布多款模型、研究和数据集

Runway：稳定、可控的视频方案 Gen-3 Alpha

Perplexity 怎么读？Qwen 又怎么读？常见 AI 名词发音一览

Luma：发个AI，比 Sora 真实、连贯、迅速

SD 3：已开源，附即用方案，附测试对比

剧透：扣子正上线「大模型竞技场」

Apple：属于每个人的 AI，在这里

分类

时事

民生

政务

教育

文化

科技

财富

体娱

健康

情感

旅行

百科

职场

楼市

企业

乐活

学术

汽车

时尚

创业

美食

幽默

美体

文摘

原创标签

时事社会财经军事教育体育科技汽车科学房产搞笑综艺明星音乐动漫游戏时尚健康旅游美食生活摄影宠物职场育儿情感小说曲艺文化历史三农文学娱乐电影视频图片新闻宗教电视剧纪录片广告创意壁纸头像心灵鸡汤星座命理教育培训艺术文化金融财经健康医疗美妆时尚餐饮美食母婴育儿社会新闻工业农业时事政治星座占卜幽默笑话独立短篇连载作品文化历史科技互联网

发布位置

广东北京山东江苏河南浙江山西福建河北上海四川陕西湖南安徽湖北内蒙古江西云南广西甘肃辽宁黑龙江贵州新疆重庆吉林天津海南青海宁夏西藏香港澳门台湾美国加拿大澳大利亚日本新加坡英国西班牙新西兰韩国泰国法国德国意大利缅甸菲律宾马来西亚越南荷兰柬埔寨俄罗斯巴西智利卢森堡芬兰瑞典比利时瑞士土耳其斐济挪威朝鲜尼日利亚阿根廷匈牙利爱尔兰印度老挝葡萄牙乌克兰印度尼西亚哈萨克斯坦塔吉克斯坦希腊南非蒙古奥地利肯尼亚加纳丹麦津巴布韦埃及坦桑尼亚捷克阿联酋安哥拉