看完这篇，你也能做 AI 搜索：论「结构化输出」

科技 2024-08-16 18:23 广东

先说结论：结构化输出，是绝大多数 AI 产品和 Agent 的核心

无论是 AI 搜索、有记忆的 bot，还是各类 agent，都是基于结构化输出搭建的。

在这篇文章里，我讲从多角度讲一下结构化输出的用途、起源和示例。

同时，文章会附带一些代码片段，方便更直观的表现（当然，完全不看关系也没问题）

什么是「格式化输出」

如果我问 GPT：介绍一下乔布斯

通常会收到这样的回复：

史蒂夫·乔布斯，1955年2月24日出生，2011年10月5日去世，美国人。他活跃于科技、创新、企业管理和动画领域。乔布斯创立了Apple、NeXT和Pixar公司，推出了Mac、iPod、iPhone等具有划时代意义的产品，重塑了个人电脑、音乐和手机行业，奠定了苹果在全球科技领域的领军地位。作为Pixar的创办人之一，他也在动画领域留下了深远的影响。乔布斯是20世纪末至21世纪初最具影响力的企业家和创新者之一。

而在很多 AI 产品中，我们并非看到一串文字，而是这样的脑图：

其背后，就是结构化输出 - 让 AI 输出 json，而非文本，比如：

{ "name": "史蒂夫·乔布斯", "birth_date": "1955-02-24", "death_date": "2011-10-05", "nationality": "美国", "fields": ["科技", "创新", "企业管理", "动画"], "companies_founded": ["Apple", "NeXT", "Pixar"], "achievements": [   "创立苹果公司",   "推出Mac、iPod、iPhone等产品",   "重塑个人电脑、音乐、手机行业",   "奠定苹果全球科技领军地位",   "Pixar创办人之一" ], "influence": "20世纪末至21世纪初最具影响力的企业家和创新者之一"}

产品的背后，都是结构化输出

依然拿「介绍一下乔布斯」这个问题举例，在不同 AI 产品中，这个问题的内部输出是不同的。

如果是搜索，它的内部输出可能是这样：

{ "query": "乔布斯", "search_by": "Google"}

获得这个结果后，再用谷歌搜索「乔布斯」，并将结果通过 AI 总结，返回给用户。

对于 Rag 工具，其数据库为《硅谷县志》，它的内部输出可能是这样

{ "rag1": "乔布斯的家庭", "rag2": "乔布斯的成长", "rag3": "乔布斯的产品", "rag4": "乔布斯的成就",}

分别对这几个信息进行 rag 后，在把结果汇总，通过 AI 总结，返回给用户。

对于四格漫画，其内部输出可能是这样的：

{"stories": [{"story": "乔布斯的家庭","prompt": "20世纪70年代复古风格，温暖的色调，柔和的线条。在美国加州的一间温馨家庭住宅，窗外阳光明媚，庭院中充满绿植和鲜花。年轻的乔布斯与他的养父母在客厅里，其母亲在织毛衣，父亲在看报纸，乔布斯坐在地上玩着一台老式计算机。画面呈现出和谐温馨的家庭场景，浓厚的亲情氛围中，乔布斯的眼中充满了好奇与探索。","caption": "家庭的力量塑造了伟大的梦想"},{"story": "乔布斯的成长","prompt": "1970年代末期的黑白摄影风格，带有强烈的对比效果。在旧金山一所简朴的高中教室，光线从窗外斜射进来，课桌上摆满了书本和笔记。年轻的乔布斯坐在教室后排，注视着老师手中的物理实验，身边的同学们都在认真听课。画面体现了乔布斯对知识的渴望，眼神专注，透出不凡的好奇心和思考的深度。","caption": "追求知识与个人成长"},{"story": "乔布斯的产品","prompt": "极简主义风格，采用现代化的色彩搭配，注重设计感。在苹果公司现代化的办公室内，简洁的玻璃桌面上摆放着第一代Macintosh，背景是白色的墙壁和大型苹果标志。乔布斯站在桌前，手指轻触Macintosh，身后几位工程师在讨论。画面重点突出乔布斯与他的产品，展示出科技与设计的完美结合，乔布斯的神态自信且充满远见。","caption": "通过产品改变世界"},{"story": "乔布斯的成就","prompt": "超现实主义风格，带有未来感，色彩鲜明且具有冲击力。在庞大的苹果公司总部前，未来风格的天空中悬浮着乔布斯的头像，周围环绕着iPhone、iPad、Mac等产品。乔布斯的巨大肖像与天空中的科技产品融为一体，象征着他对现代科技的深远影响。画面展现了一幅震撼的图景，乔布斯的形象如同神话般屹立在现代科技的顶峰。","caption": "达到科技的巅峰"}]}

然后分别对这几个信息，进行画图，在展示给用户。

以「AI 天气预报」为例

现在换个例子：我有一个天气预报 AI，如果用户问到了天气，则进行告知

实际上，这个 AI 并不是真的用 AI 去实时预测，而是问题，转化成一个请求，去查询「天气预报数据库」，再返回给用户（当然，如果用户的问题和天气无关，则正常回答）

具体的流程如下：

从程序的角度，执行以下：

通过结构化输出，判断该问题和天气有关，并拆解出 location 和 date 两个字段分别是北京和明天
向接口请求信息北京和明天（严谨来说是，Beijing 和 2024-08-16）
AI 进行结构化返回：{"date": "2024-08-16","location": "北京","temperature": {"high": "32°C","low": "24°C"},"weather": "雷阵雨，云层厚","humidity": "77%","UV_index": "高","advice": "携带雨具，注意防晒"}
向接口发送类似下面的信息

client.chat.completions.create(  model="gpt-4o",  messages=[    {"role": "system", "content": "你是天气预报机器人，今天是 2024-08-16"},    {"role": "user", "content": """        {"date": "2024-08-16","location": "北京","temperature": {"high": "32°C","low": "24°C"},"weather": "雷阵雨，云层厚","humidity": "77%","UV_index": "高","advice": "携带雨具，注意防晒"}        北京明天的天气怎么样？    """}  ])

接口返回的结果，会类似：明天北京的天气预报显示白天的气温将达到最高约32°C，夜间最低温度约为24°C。预计会有雷阵雨，全天云层较厚，湿度较高，达到77%左右。雷阵雨可能会在上午和下午出现，因此出行时建议携带雨具，并注意防晒，因为紫外线指数较高。总的来说，天气闷热，空气湿润，体感温度可能会比实际温度更高

也可以利用「结构化输出」，对设备进行 IoT 适配

比如，我学过电工（EE），就可以让 Coze 变成一个家庭中控，如：

家里的数据

封装成 Coze Bot

产业演化史

在 AI 领域，我们通常认为，结构化输出的第一次大规模使用，是源自去年 5 月 OpenAI 的 Plugin 正式上线：AI 可以通过结构化输出，来调用外部工具。

并且，截止到当前，OpenAI 在结构化输出这块，供进行了 4 次迭代，包括 Plugin 方法，Function Calling，Json Mode 和前两天新出的 Structured Outputs。

当然了，你也可以用 markdown 等 prompt 方法来模拟结构化输出，但不在本次的讨论范围。

Plugin 方法

在 2023 年 3 月，当时参与到 plugin 内测的朋友，会看到一份如何让 ChatGPT 调用外部工具的文档，也是结构化输出的雏形。

流程就和上文一样，ChatGPT 在获知用户的请求后，通过结构化输出的方式，生成包括插件选择在内的一个 json，插件在接受到这些参数后开始处理，并给到一个回调。之后这套东西，变成了 GPTs 的 Action。

注意：这套方法并未通过接口的方式发布

Function Calling

在 2023 年 6 月，OpenAI 带来了 0613 年中更新，并发布了 Function Calling，也是现在看来最广泛使用的调用方法，国内模型普遍支持。

下面，我们以一个更直观的例子，来看看 Function Calling 的使用过程。以用户查询包裹为例，这个 bot 处理任务的过程中，总计分 2 步：

用户向 AI 询问【我的包裹，编号12345，寄了吗？】的时候，其请求额外带上字段 tools，在其中定义要获取的信息 order_id
假设获取到的信息是 order_12345 ，通过查询数据库，获得包裹信息 2024-08-01
将这个信息，和历史提问合并，再交给大模型，获得最终输出包裹在 2024-08-01 的时候已经寄出去了

如果用代码的方式，就是：

tools = [    {        "type": "function",        "function": {            "name": "get_delivery_date",            "description": "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",            "parameters": {                "type": "object",                "properties": {                    "order_id": {                        "type": "string",                        "description": "The customer's order ID."                    }                },                "required": ["order_id"],                "additionalProperties": False            }        }    }]
messages = []messages.append({"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."})messages.append({"role": "user", "content": "Hi, can you tell me the delivery date for my order?"})messages.append({"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"})messages.append({"role": "user", "content": "i think it is order_12345"})

rsp = client.chat.completions.create(    model='gpt-4o',    messages=messages,    tools=tools)

之后，AI 会返回类似：

ChatCompletion(id='chatcmpl-9wY3ulTLZswqZLF58L0LQ0sM1EAsG', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_W1KzfgxvkoxjCAGT3Td9oVPk', function=Function(arguments='{"order_id":"order_12345"}', name='get_delivery_date'), type='function')]))], created=1723740986, model='gpt-4o-2024-05-13', object='chat.completion', service_tier=None, system_fingerprint='fp_3aa7262c27', usage=CompletionUsage(completion_tokens=19, prompt_tokens=140, total_tokens=159))

其中 response.choices[0].message.tool_calls[0].function.arguments 的值，就是 {"order_id":"order_12345"}

假定查询到的结果是 2024-08-01

# Prepare the chat completion call payloadcompletion_payload = {    "model": "gpt-4o",    "messages": [        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},        {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},        {"role": "user", "content": "i think it is order_12345"},        rsp.choices[0].message,        {"role": "tool", "content": "delivery_date：2024-08-01", "tool_call_id": rsp.choices[0].message.tool_calls[0].id},    ]}
# Call the OpenAI API's chat completions endpoint to send the tool call result back to the modelresponse = client.chat.completions.create(    model=completion_payload["model"],    messages=completion_payload["messages"],)
# Print the response from the API. In this case it will typically contain a message such as "The delivery date for your order #12345 is xyz. Is there anything else I can help you with?"print(response)

最终，你会得到

ChatCompletion(id='chatcmpl-9wYV7Yhkimzlpg3ejNkjRjI0GKqyw', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Your order with ID "order_12345" is scheduled to be delivered on August 1, 2024. If you have any other questions or need further assistance, feel free to ask!', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1723742673, model='gpt-4o-2024-05-13', object='chat.completion', service_tier=None, system_fingerprint='fp_3aa7262c27', usage=CompletionUsage(completion_tokens=40, prompt_tokens=111, total_tokens=151))

也就是包裹在 2024-08-01 的时候已经寄出去了

回顾一下

上面完成这个对话的时候，用户给出了一次 prompt: i think it is order_12345，但 AI 实际上是跑了 2 次：

第一次是获取 order id
第二次才是真正是生成内容包裹在 2024-08-01 的时候已经寄出去了

同时，在第二次的对话中，结尾挂着第一次的 response 和数据库查找结果。

在数据库的查询结果中，role 为 tool

还需注意

如果你在某些代码中，看到 Function Calling 的查询信息，不是用 tool，而是用 function，这也没错。

因为 OpenAI 曾经改过 Function Calling 的接口实现：最开始是 function 结构，后面改成了 tool 结构。对于 tool 和 function 这两种写法，目前都行，但后续 OpenAI 将只支持 tool 结构

吐槽：我个人更喜欢 function 结构，更优雅

使用 tool 结构

"messages": [        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},        {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},        {"role": "user", "content": "i think it is order_12345"},        rsp.choices[0].message,        {"role": "tool", "content": "delivery_date：2024-08-01", "tool_call_id": rsp.choices[0].message.tool_calls[0].id}]使用 function 结构"messages": [        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},        {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},        {"role": "user", "content": "i think it is order_12345"},        {"role": "function", "content": "delivery_date：2024-08-01", "name": "delevery_record"}]

使用 function 结构


"messages": [        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},        {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},        {"role": "user", "content": "i think it is order_12345"},        {"role": "function", "content": "delivery_date：2024-08-01", "name": "delevery_record"}]

另外：也可以两种结构都不用

"messages": [        {"role": "system", "content": "You are a helpful customer support assistant. Use the supplied tools to assist the user."},        {"role": "user", "content": "Hi, can you tell me the delivery date for my order?"},        {"role": "assistant", "content": "Hi there! I can help with that. Can you please provide your order ID?"},        {"role": "user", "content": "i think it is order_12345. Related record is: delivery_date：2024-08-01"}]

Json Mode

在 2023 年 11 月，OpenAI 在开发者大会上，带来了 Json Mode 更新。

仔细看上面的 Function Calling，其参数是通过 string 给到的，不够稳定。Json Mode 便是为了解决这一问题：直接输出 Json。

注意：这种方法仍然不够稳定，并已被 Structured Outputs 取代

调用的时候，要求：

prompt 里出现 json 这个单词
response_format 设置为 "type": "json_object"

比如

completion_payload = {    'model': 'gpt-3.5-turbo',     'messages': [{'role': 'user', 'content': '告诉我四大名著分别是什么，以及他们的作者是谁，按这个 json 格式: {{\'书名\':\'xxx\'，\'作者\':\'xxx\'}...}'}],     'response_format': {'type': 'json_object'}    }
# Call the OpenAI API's chat completions endpoint to send the tool call result back to the modelresponse = client.chat.completions.create(    model=completion_payload["model"],    messages=completion_payload["messages"],)

得到 resoponse 为

ChatCompletion(id='chatcmpl-9wZ5DHWicaarxccmTBGi8MfJsa6AQ', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="{\n    {'书名': '西游记', '作者': '吴承恩'},\n    {'书名': '红楼梦', '作者': '曹雪芹'},\n    {'书名': '水浒传', '作者': '施耐庵'},\n    {'书名': '三国演义', '作者': '罗贯中'}\n}", refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1723744911, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=92, prompt_tokens=57, total_tokens=149))

其中，通过 response.choices[0].message.content 可去到 json 信息，如需进行后续处理，依然沿用 function calling 中的方法

Structured Outputs

较之 Function Calling 和 Json Mode，Structured OutPuts 明显好用了很多，当前支持以下模型：gpt-4o-mini, gpt-4o-2024-08-06，当然，也包括之后的模型。

简单调试测试一下

刚才的四大名著的例子，代码这么写

from pydantic import BaseModel
class theBook(BaseModel):    name: str    writer: str
class theFour(BaseModel):    steps: list[theBook]
completion = client.beta.chat.completions.parse(    model="gpt-4o-2024-08-06",    messages=[        {"role": "system", "content": "Extract the event information."},        {"role": "user", "content": "告诉我四大名著分别是什么，以及他们的作者是谁"},    ],    response_format = theFour,)
response = completion.choices[0].message.parsed

得到的结果是

theFour(steps=[theBook(name='《红楼梦》', writer='曹雪芹'), theBook(name='《西游记》', writer='吴承恩'), theBook(name='《三国演义》', writer='罗贯中'), theBook(name='《水浒传》', writer='施耐庵')])

非常好用！

高级调用

通过这种方法，还可以完成单次对话的 CoT，比如：

from pydantic import BaseModel
class Step(BaseModel):    explanation: str    output: str
class MathReasoning(BaseModel):    steps: list[Step]    final_answer: str
completion = client.beta.chat.completions.parse(    model="gpt-4o-2024-08-06",    messages=[        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},        {"role": "user", "content": "how can I solve 8x + 7 = -23"}    ],    response_format=MathReasoning,)
math_reasoning = completion.choices[0].message.parsed

得到结果

{  "steps": [    {      "explanation": "Start with the equation 8x + 7 = -23.",      "output": "8x + 7 = -23"    },    {      "explanation": "Subtract 7 from both sides to isolate the term with the variable.",      "output": "8x = -23 - 7"    },    {      "explanation": "Simplify the right side of the equation.",      "output": "8x = -30"    },    {      "explanation": "Divide both sides by 8 to solve for x.",      "output": "x = -30 / 8"    },    {      "explanation": "Simplify the fraction.",      "output": "x = -15 / 4"    }  ],  "final_answer": "x = -15 / 4"}

结语

以上差不多就是就是关于「结构化输出」的大致内容。以及...

最近我在写一个「AI 行业」的知识库，会有很多类似本篇的信息，完全面向从业者的。

等搞得差不多了，请大家来看～拜了个拜～

http://mp.weixin.qq.com/s?__biz=MzkzNDQxOTU2MQ==&mid=2247490254&idx=1&sn=f49e6ad20fe3be90f563f565dfab4bac

赛博禅心

拜AI古佛，修赛博禅心