ChatGPT所代表的的大语言模型在应用时,我们最关心的是模型所输出结果的指向性、正确性、确定性等问题(自由发挥的创意类任务除外)。因为我的团队正在研发新一代AI驱动的软件产品,我也一直在思考这个问题,无论是从微观的技术层面,还是从宏观的方法论层面。
今天的AI特别擅长从宏观层面帮助我们建立思维框架,弥补我们的思维缺漏。这几天,我就这个问题反复与ChatGPT交流。今天中午的一次交流,我觉得对我的启发意义特别大。真的很感谢ChatGPT帮我梳理思路。虽然我不完全同意ChatGPT在这个交谈中表达的观点、意见,但非常愿意将交谈的全过程记录在这里。所有关心生成式AI如何输出确定性结果的研发者,大概都可以从这个交谈中得到些启示。
交谈用英文进行(原谅我的英文里常有语法错误),加粗字体是我的提问,其他段落是ChatGPT(March 14版)的回答:
咏刚:We know the fundamental structure/algorithm of large language models such as GPT has the intrinsic attribute to generate nondeterministic results. What are the root causes of such kind of nondeterminism?
The nondeterministic results produced by large language models like GPT are mainly due to the nature of the training process and the design of the model.
1. Stochastic Gradient Descent: One of the main causes of nondeterminism in large language models is the use of stochastic gradient descent (SGD) during training. SGD is an optimization algorithm used to update the model's parameters during training. It updates the parameters based on a random subset of the training data (called a mini-batch), which can lead to small variations in the updates made to the model's parameters.
2. Random Initialization: Another cause of nondeterminism is the random initialization of the model's parameters. When a large language model is trained, the weights of the neural network are initialized randomly, which can lead to different starting points and therefore different final results.
3. Dropout: Dropout is a regularization technique used to prevent overfitting during training. It randomly drops out (i.e., sets to zero) some of the neurons in the network during each training iteration, which can lead to different results each time the model is trained.
4. Sampling: The process of generating text from a large language model involves sampling from the model's probability distribution over the next word or character. This sampling process is inherently stochastic, and different samples can be generated from the same input sequence.
5. Input Variability: Finally, the input given to the model can also introduce variability. For example, if the input sequence is truncated or if different input sequences are used, this can result in different outputs from the model.
Overall, the combination of these factors can lead to the nondeterministic results observed in large language models like GPT.
咏刚:Are there practical methods to control how confident or precise a large language model generates output? Ideally, can we enable a large language model to always result in the same output given the same prompt about a particular fact?
There are several methods that can be used to control the confidence or precision of a large language model, but achieving complete determinism in the output is not feasible due to the inherent stochasticity of the model.
1. Temperature: One of the most common methods for controlling the confidence of a large language model is to adjust the temperature parameter during sampling. Lowering the temperature will result in more conservative and predictable predictions, while increasing the temperature will result in more diverse and unpredictable predictions.
2. Top-k Sampling: Another method for controlling the precision of the output is to use top-k sampling. This method limits the sampling to the k most probable tokens, reducing the chance of the model generating unlikely or irrelevant tokens.
3. Nucleus Sampling: Nucleus sampling is another technique for controlling the precision of the output. This method selects the smallest possible set of tokens whose cumulative probability exceeds a given threshold, resulting in a diverse and precise output.
4. Prompt Engineering: Prompt engineering involves designing specific prompts or templates that guide the model towards generating specific types of output. This approach can be effective in generating more precise output for specific tasks.
While these methods can help control the confidence or precision of the output, they cannot guarantee complete determinism in the output. Due to the stochastic nature of large language models like GPT, even with the same input prompt, the model may generate different outputs each time it is run.
咏刚:It sounds frustrating when people figure out LLMs cannot guarantee complete determinism in the output. Considering an opposite example - a math formula without any randomness can always reach the same output for the same input. Does this prove that LLMs work differently compared to math formulas?
Yes, it is true that large language models (LLMs) like GPT work differently compared to math formulas in terms of determinism.
Mathematical formulas are deterministic in nature, meaning that given the same input, they will always produce the same output. This is because mathematical operations are well-defined and precise, and there is no randomness or uncertainty involved.
On the other hand, LLMs like GPT are stochastic in nature, meaning that they involve randomness and uncertainty in their operations. This is because LLMs are based on probabilistic models that assign probabilities to each possible output based on the input and the model's learned parameters. Therefore, even with the same input, the model may assign different probabilities to different outputs, resulting in different output sequences each time it is run.
The fundamental difference between math formulas and LLMs is that math formulas are designed to produce a specific output based on a set of well-defined rules, while LLMs are designed to learn patterns and relationships in large datasets, allowing them to generate new output based on the learned patterns.
So while LLMs can be incredibly powerful in generating new and creative output, they cannot guarantee complete determinism in their output due to their stochastic nature.
咏刚:Math formulas are typically treated as a critical part of human intelligence. Conversely, we expect LLMs such as GPT to be an early prototype to mimic 100% of human intelligence. Given this, are LLMs on the correct path to the final artificial general intelligence?
Large language models (LLMs) like GPT are a promising step towards developing artificial general intelligence (AGI), but they are not sufficient on their own to achieve full AGI.
LLMs are designed to learn patterns and relationships in large datasets and use those patterns to generate new output. While they have achieved impressive results in natural language processing tasks such as language generation, translation, and question-answering, they are still limited in their ability to understand and reason about the world in the way that humans do.
One of the main challenges in developing AGI is creating machines that can reason about the world in a way that is similar to human reasoning. This involves not just generating text, but also understanding concepts, making inferences, and generalizing knowledge to new situations. Achieving this level of understanding and reasoning will require advances in multiple fields such as computer vision, natural language understanding, robotics, and cognitive psychology.
Therefore, while LLMs like GPT are a crucial step towards AGI, they are not sufficient on their own to achieve full AGI. It will require continued research and development in multiple fields, and the integration of these fields to create a truly intelligent system that can reason about the world in a human-like way.
咏刚:Yeah, "human reasoning" is a key phrase here. You take human reasoning as a challenging goal today's LLMs cannot easily achieve. But what is human reasoning? Is human reasoning just something humans do not understand deeply and can be implemented with a way simpler method?
Human reasoning is a complex process that involves many cognitive processes such as perception, attention, memory, inference, decision-making, and learning. It is difficult to fully understand and replicate the entire process of human reasoning with a simpler method, as it involves many interacting factors that are still not fully understood.
While LLMs have achieved remarkable progress in natural language processing tasks, they still lack the ability to reason about the world in the same way that humans do. For example, while an LLM may be able to generate a plausible story based on a given prompt, it may not be able to understand the underlying concepts and causal relationships involved in the story.
One of the key challenges in developing AGI is creating machines that can learn and reason about the world in a way that is similar to humans. This requires not only advances in natural language processing but also in areas such as computer vision, robotics, and cognitive psychology. It also requires an understanding of the neural mechanisms and algorithms that underlie human reasoning, which is still an active area of research.
While it is possible that there may be simpler methods that can replicate certain aspects of human reasoning, it is unlikely that a single method will be able to fully replicate the complexity and flexibility of human reasoning. Therefore, developing AGI will likely require the integration of multiple methods and approaches from various fields of study.
咏刚:So you don't agree that we can make a general-purpose AI system based on only one foundation model or one fundamental algorithm in a few years.
No, I do not believe that we can create a general-purpose AI system based on only one foundation model or one fundamental algorithm in a few years. Creating AGI is a complex and multi-disciplinary challenge that requires advances in many areas such as natural language processing, computer vision, robotics, cognitive psychology, neuroscience, and more.
While LLMs like GPT have shown impressive progress in natural language processing tasks, they are still limited in their ability to understand and reason about the world in the same way that humans do. There is still much work to be done in developing the algorithms and architectures that can enable machines to learn and reason in a more human-like way.
Furthermore, AGI requires not only advances in algorithms but also in hardware, data, and infrastructure. Developing AGI requires massive amounts of data to train the models, as well as specialized hardware to accelerate the computations required. Additionally, developing an AGI system will require significant investment in research and development, as well as in building the necessary infrastructure to support such a system.
Overall, developing AGI is a long-term and multi-disciplinary challenge that will require significant advances in many areas over many years. While we have made remarkable progress in AI in recent years, we are still a long way from achieving AGI.
咏刚:It sounds very likely that we can combine the strengths of two paradigms: a non-deterministic system to understand the natural world and a deterministic system such as a well-defined math reasoning and calculation system to execute actions without randomness the natural world. Do you agree?
Yes, I agree that combining the strengths of non-deterministic and deterministic systems could be a promising approach towards developing AGI.
Non-deterministic systems like LLMs are good at learning and understanding patterns in large and complex datasets, such as natural language text. They can generate creative and diverse outputs, but their output may not always be reliable or consistent.
On the other hand, deterministic systems like mathematical reasoning and calculation systems are based on well-defined rules and logic, and can be relied upon to produce consistent and accurate results. They are good at executing specific tasks with a high degree of precision and reliability, but they may lack the flexibility and creativity of non-deterministic systems.
Combining these two approaches could lead to a more powerful and robust system that is capable of both understanding and reasoning about the world, as well as executing precise actions based on that understanding. For example, an AGI system could use a non-deterministic system to understand natural language text, and then use a deterministic system to execute specific actions based on that understanding.
Overall, I believe that combining the strengths of non-deterministic and deterministic systems could be a promising approach towards developing AGI, and could lead to more robust and powerful AI systems.
咏刚:Can you make a summarization of our conversation?
Sure, here's a summary of our conversation:
We discussed the root causes of the nondeterminism in large language models (LLMs) such as GPT, and whether it is possible to control the precision and confidence of their output.
We talked about the differences between LLMs and mathematical formulas, and whether LLMs are on the correct path towards artificial general intelligence (AGI).
We agreed that human reasoning is a challenging goal that current LLMs cannot easily achieve, and that AGI will require advances in multiple areas, not just natural language processing.
We discussed the possibility of combining non-deterministic and deterministic systems to create a more robust and powerful AGI system.
Overall, our conversation covered some of the challenges and possibilities of developing AGI, and the role that LLMs and other AI technologies might play in achieving that goal.