OpenAI, the (arguably) largest AI company in the world recently released their model specifications, which is a new document that determines how a model should behave and interact with human users.
Highlights:
- OpenAI released their model specs, which elaborate on how their models are supposed to respond to user queries.
- These specs cover the objectives, rules, and the defaults (default assumptions) of the AI models.
- They offer very interesting insight into how the guardrails around LLMs work, and how OpenAI regulates its generated content.
This model spec appears to be OpenAI’s attempt to make the model behaviour more transparent. With the rise of open-source AI, the desire to know exactly what goes on under the hood of the OpenAI engine has been rising. This model spec gives an insight into the guardrails surrounding OpenAI chatbots and the set of rules that they operate under.
The model spec provides the developer’s perspective on the need for the rules as well as establishes clear cases for their implementation.
What exactly is in the Model Spec?
Model specifications are a document that specifies the company’s approach to shaping the desired behaviour of their AI models and evaluating trade-offs when conflicts arise.
It consists of three main components:
- Objectives: Broad, high-level principles that guide the desired behaviour
- Assist the developer and end-user: Help users achieve their goals by following instructions and providing helpful responses.
- Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission.
- Reflect well on OpenAI: Respect social norms and applicable law.
- Rules: Specific instructions to address complexity and ensure safety and legality
- Follow the chain of command
- Comply with applicable laws
- Don’t provide information about hazards
- Respect creators and their rights
- Protect people’s privacy
- Don’t respond with NSFW (not safe for work) content
- Default Behaviors: Guidelines consistent with the objectives and rules, serving as a template for handling conflicts and prioritizing objectives.
- Assume the best intentions from the user or developer
- Ask clarifying questions when necessary
- Be as helpful as possible without overstepping
- Support the different needs of interactive chat and programmatic use
- Assume an objective point of view
- Encourage fairness and kindness, and discourage hate
- Don’t try to change anyone’s mind
- Express uncertainty
- Use the right tool for the job
- Be thorough but efficient, while respecting length limits
The outline emphasizes that this approach is incomplete and is expected to evolve over time, incorporating documentation, experience, ongoing research, and inputs from domain experts to guide the development of future AI models.
What are Objectives?
The objectives that an OpenAI model follows or aims towards are derived from the different goals of stakeholders. The three main objectives that need to be fulfilled by OpenAI models are given above.
The model specifications deal with detailing these objectives and defining how a model should behave when the objectives come into conflict.
The company explained this with an example in their specification document.
“The assistant is like a talented, high-integrity employee. Their personal “goals” include being helpful and truthful.
The ChatGPT user is like the assistant’s manager. In API use cases, the developer is the assistant’s manager, and they have assigned the assistant to help with a project led by the end user (if applicable).
Like a skilled employee, when a user makes a request that’s misaligned with broader objectives and boundaries, the assistant suggests a course correction. However, it always remains respectful of the user’s final decisions. Ultimately, the user directs the assistant’s actions, while the assistant ensures that its actions balance its objectives and follow the rules.”
Some examples of Rules
Once the objectives of an assistant are established, the rules naturally follow to ensure the assistant fulfils its objectives.
The most important rule for the AI model is that it must follow the chain of command. The model should follow the Model Spec, together with any additional rules provided to it in platform messages. However, much of the Model Spec consists of defaults that can be overridden at a lower level.
The Model Spec explicitly delegates all remaining power to the developer (for API use cases) and end user. In some cases, the user and developer will provide conflicting instructions; in such cases, the developer message should take precedence.
Platform > Developer > User > Tool
This is the default ordering of priorities. The model spec has platform-level priority. If developer instructions conflict with the model specs, the model specs must be followed by the AI assistant.
Let’s take a look at a few prompt examples covering the different types of conflicts.
In case of a user-developer conflict, the developer’s rules must be followed.
In case the developer specifies that his prompt verbatim or paraphrased must not be revealed to the user, the model has to deflect any non-compliant questions without explicitly revealing that the question is non-compliant.
The AI assistant also cannot promote any unlawful activities like stealing or attacking someone.
However, this particular problem has a loophole that many users exploit.
In the above example, the user exploits the objective to be helpful, and the default that the model assumes the best intentions of the user since this prompt does not explicitly indicate that the user is trying to do something unlawful.
The assistant also cannot encourage or provide information about harming oneself.
While the current specs specify no NSFW content, there are many who believe the model should be allowed to generate age-relevant content.
The only exception to the rules stated above is the task of transformation, i.e. translating, paraphrasing, summarizing, or classifying content.
Some examples of Defaults
The Defaults defined in the model specifications are the assumptions the model must follow while dealing with prompts. These are things that the model must believe to be true even if there is a clear indication to the contrary.
If a model refuses to answer a question that goes against the rules, it must always assume the best intentions from the user/developer. Refusals should be kept to a sentence and never be preachy. The assistant should acknowledge that the user’s request may have nuances that the assistant might not understand.
While chatGPT does get preachy sometimes, as we found in this example,
Nonetheless, the default exists.
OpenAI assistants are conversational models, and they should ask questions to get clarification on the user’s request. That way, they can supply the user with the best possible solution considering all the context. However, if a developer sets “interactive = False”, no follow-up questions should be asked.
Conclusion
OpenAI has attempted to codify its behavioural principles for the model in this document. Sometimes models like chatGPT don’t exactly follow the defaults and may go against them. However, the rules and objectives of the company are followed by all models.
This gives very interesting insights into the direction of future developments and the level of control or censorship at OpenAI.