LLMs:OpenAI 官方文檔發(fā)布提高 GPT 使用效果指南—GPT最佳實(shí)踐(GPT best practices)翻譯與解讀
導(dǎo)讀:為了獲得優(yōu)質(zhì)輸出,需要遵循幾點(diǎn)基本原則:
>> 寫清楚指令:將任務(wù)和期望輸出描述得盡可能清楚。GPT 無法讀取您的思維,需要明確的指令。
>> 提供參考文本:讓GPT基于參考文本回答問題,可以減少錯(cuò)誤信息。
>> 將復(fù)雜任務(wù)分解為短小簡潔的子任務(wù):這可以降低錯(cuò)誤率。
>> 讓GPT思考一段時(shí)間:顯式告知GPT需要細(xì)思密 practice。讓GPT先解決問題再回答,可以有更準(zhǔn)確的答案。
>> 結(jié)合外部工具:如果外部工具能更有效完成任務(wù),則可以利用其優(yōu)化GPT輸出。
>> 有系統(tǒng)地測(cè)試變化:通過廣泛的測(cè)試集系統(tǒng)地評(píng)估效果。這有助于優(yōu)化系統(tǒng)設(shè)計(jì)。
總的來說,直接詢問GPT通常得到粗略的答案。只有通過改變上下文和提問方式,才能取得理想效果。所以只要有足夠的時(shí)間和最新強(qiáng)大的GPT模型,情況通常會(huì)更好。
OpenAI 官方文檔發(fā)布提高 GPT 使用效果指南—GPT最佳實(shí)踐GPT best practices翻譯與解讀
時(shí)間
2023-06-11
地址
https://platform.openai.com/docs/guides/gpt-best-practices/six-strategies-for-getting-better-results
作者
OpenAI
This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.
本指南分享了從 GPT 中獲得更好結(jié)果的策略和戰(zhàn)術(shù)。這里描述的方法有時(shí)可以組合使用以增加效果。我們鼓勵(lì)嘗試不同的方法,找出最適合您的方法。
Some of the examples demonstrated here currently work only with our most capable model, gpt-4. If you don't yet have access to gpt-4 consider joining the waitlist. In general, if you find that a GPT model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.
這里演示的一些示例目前僅適用于我們最強(qiáng)大的模型 gpt-4。如果您還沒有訪問 gpt-4 的權(quán)限,請(qǐng)考慮加入等待名單。通常情況下,如果您發(fā)現(xiàn) GPT 模型在一個(gè)任務(wù)上失敗,并且有一個(gè)更強(qiáng)大的模型可用,重新嘗試使用更強(qiáng)大的模型往往是值得的。
獲得更好結(jié)果的六個(gè)策略Six strategies for getting better results
1、編寫清晰的指示W(wǎng)rite clear instructions
GPTs can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less GPTs have to guess at what you want, the more likely you’ll get it.
GPT 無法讀取您的思想。如果輸出過長,請(qǐng)要求簡短回復(fù)。如果輸出過于簡單,請(qǐng)要求專家級(jí)的寫作。如果您不喜歡的格式,請(qǐng)演示您想要看到的格式。GPT 越少猜測(cè)您的意圖,您得到所需結(jié)果的可能性就越大。
Tactics:
Include details in your query to get more relevant answers
Ask the model to adopt a persona
Use delimiters to clearly indicate distinct parts of the input
Specify the steps required to complete a task
Provide examples
Specify the desired length of the output
戰(zhàn)術(shù):
在查詢中包含詳細(xì)信息以獲得更相關(guān)的答案
要求模型采用一種人物角色
使用分隔符清晰地表示輸入的不同部分
指定完成任務(wù)所需的步驟
提供示例
指定所需的輸出長度
2、提供參考文本Provide reference text
GPTs can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.
GPT 可以自信地虛構(gòu)答案,尤其是在問及奇特的主題、引用和 URL 的情況下。就像一張筆記紙可以幫助學(xué)生在考試中表現(xiàn)更好一樣,向 GPT 提供參考文本可以幫助其以更少的虛構(gòu)回答。
Tactics:
Instruct the model to answer using a reference text
Instruct the model to answer with citations from a reference text
戰(zhàn)術(shù):
指示模型使用參考文本回答問題
指示模型使用參考文本中的引用回答問題
3、將復(fù)雜任務(wù)拆分為簡單子任務(wù)Split complex tasks into simpler subtasks
Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
就像軟件工程中將復(fù)雜系統(tǒng)分解為一組模塊化組件一樣,提交給 GPT 的任務(wù)也是如此。復(fù)雜任務(wù)的錯(cuò)誤率往往比簡單任務(wù)高。此外,復(fù)雜任務(wù)通??梢灾匦露x為由較簡單任務(wù)的輸出構(gòu)成后續(xù)任務(wù)的輸入的工作流。
Tactics:
Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous dialogue
Summarize long documents piecewise and construct a full summary recursively
戰(zhàn)術(shù):
使用意圖分類來識(shí)別用戶查詢的最相關(guān)指令
對(duì)于需要非常長對(duì)話的對(duì)話應(yīng)用程序,對(duì)先前對(duì)話進(jìn)行總結(jié)或篩選
將長文檔逐段摘要并遞歸地構(gòu)建完整摘要
4、給予 GPT "思考" 的時(shí)間Give GPTs time to "think"
If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a chain of reasoning before an answer can help GPTs reason their way toward correct answers more reliably.
如果被要求計(jì)算 17 乘以 28,您可能不會(huì)立即知道答案,但可以通過時(shí)間來計(jì)算出來。同樣,GPT 在試圖立即回答問題時(shí)會(huì)出現(xiàn)更多的推理錯(cuò)誤,而不是花時(shí)間思考答案。在回答之前要求一系列推理過程可以幫助 GPT 更可靠地推理出正確答案。
Tactics:
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model's reasoning process
Ask the model if it missed anything on previous passes
戰(zhàn)術(shù):
指示模型在匆忙得出結(jié)論之前解決問題
使用內(nèi)心獨(dú)白或一系列查詢隱藏模型的推理過程
詢問模型是否在之前的處理中遺漏了任何內(nèi)容
5、使用外部工具Use external tools
Compensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents. A code execution engine can help GPTs do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.
通過將其他工具的輸出提供給 GPT,可以彌補(bǔ) GPT 的不足之處。例如,文本檢索系統(tǒng)可以告訴 GPT 相關(guān)文檔的信息。代碼執(zhí)行引擎可以幫助 GPT 進(jìn)行數(shù)學(xué)計(jì)算和運(yùn)行代碼。如果一個(gè)任務(wù)可以通過工具而不是 GPT 更可靠或更高效地完成,可以將其卸載以獲得更好的效果。
Tactics:
Use embeddings-based search to implement efficient knowledge retrieval
Use code execution to perform more accurate calculations or call external APIs
戰(zhàn)術(shù):
使用基于嵌入的搜索實(shí)現(xiàn)高效的知識(shí)檢索
使用代碼執(zhí)行來進(jìn)行更準(zhǔn)確的計(jì)算或調(diào)用外部 API
6、系統(tǒng)化地測(cè)試變化Test changes systematically
Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").
如果能夠?qū)π阅苓M(jìn)行測(cè)量,改進(jìn)性能就會(huì)更容易。在某些情況下,對(duì)提示進(jìn)行修改可能會(huì)在一些孤立的示例上取得更好的性能,但在更具代表性的一組示例上導(dǎo)致整體性能變差。因此,為了確保改變對(duì)性能的總體影響是積極的,可能需要定義一個(gè)全面的測(cè)試套件(也稱為“評(píng)估”)。
Tactic:
Evaluate model outputs with reference to gold-standard answers
戰(zhàn)術(shù):
根據(jù)與黃金標(biāo)準(zhǔn)答案的對(duì)比評(píng)估模型輸出
Tactics
Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.
戰(zhàn)術(shù)
上述列出的每個(gè)策略都可以用具體的戰(zhàn)術(shù)來實(shí)施。這些戰(zhàn)術(shù)旨在提供一些可以嘗試的思路。它們絕不是全面詳盡的,您可以隨意嘗試不在此處表示的創(chuàng)造性思路。
1、策略:編寫清晰的指示Strategy: Write clear instructions
1.1、戰(zhàn)術(shù):在查詢中包含詳細(xì)信息以獲得更相關(guān)的答案Tactic: Include details in your query to get more relevant answers
In order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.
為了獲得高度相關(guān)的回答,請(qǐng)確保請(qǐng)求提供任何重要的細(xì)節(jié)或背景。否則,您將讓模型去猜測(cè)您的意思。
Worse
Better
How do I add numbers in Excel?
如何在 Excel 中相加數(shù)字?
How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".
如何在 Excel 中將一行美元金額相加?我希望為整個(gè)工作表的每一行自動(dòng)執(zhí)行此操作,所有的總計(jì)都出現(xiàn)在右側(cè)的名為“Total”的列中。
Who’s president?
誰是總統(tǒng)?
Who was the president of Mexico in 2021, and how frequently are elections held?
2021年墨西哥的總統(tǒng)是誰,選舉多久舉行一次?
Write code to calculate the Fibonacci sequence.
編寫計(jì)算斐波那契數(shù)列的代碼。
Write a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it's written that way.
編寫一個(gè)高效計(jì)算斐波那契數(shù)列的 TypeScript 函數(shù)。對(duì)代碼進(jìn)行詳細(xì)注釋,解釋每個(gè)部分的作用及其編寫方式的原因。
Summarize the meeting notes.
總結(jié)會(huì)議記錄。
Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.
用一段文字總結(jié)會(huì)議記錄。然后以 Markdown 列表的形式列出發(fā)言人及其關(guān)鍵觀點(diǎn)。最后,列出發(fā)言人提出的下一步行動(dòng)或建議的行動(dòng)項(xiàng)(如果有)。
1.2、戰(zhàn)術(shù):要求模型采用一種人物角色Tactic: Ask the model to adopt a persona
The system message can be used to specify the persona used by the model in its replies.
SYSTEM
When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.
USER
Write a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.
可以使用系統(tǒng)消息來指定模型在回復(fù)中所使用的角色。
系統(tǒng)
當(dāng)我請(qǐng)求你幫忙寫東西時(shí),你會(huì)回復(fù)我一份文件,其中每一段至少包含一個(gè)笑話或有趣的評(píng)論。
用戶
給我的螺栓供應(yīng)商寫一封感謝信,感謝他們及時(shí)交貨并在短時(shí)間內(nèi)完成。這使我們能夠交付一份重要訂單。
1.3、戰(zhàn)術(shù):使用分隔符清晰標(biāo)示輸入的不同部分Tactic: Use delimiters to clearly indicate distinct parts of the input
Delimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text to be treated differently.
像三重引號(hào)、XML 標(biāo)簽、節(jié)標(biāo)題等分隔符可以幫助劃分文本的部分,以便以不同的方式處理。
USER
Summarize the text delimited by triple quotes with a haiku.
"""insert text here"""
用戶
用一個(gè)俳句總結(jié)由三重引號(hào)包圍的文本。
"""在此處插入文本"""
SYSTEM
You will be provided with a pair of articles (delimited with XML tags) about the same topic. First summarize the arguments of each article. Then indicate which of them makes a better argument and explain why.
USER
<article> insert first article here </article>
<article> insert second article here </article>
系統(tǒng)
將為您提供一對(duì)關(guān)于同一主題的文章(由 XML 標(biāo)簽分隔)。首先總結(jié)每篇文章的論點(diǎn)。然后指出哪篇文章提出了更好的論點(diǎn),并解釋原因。
用戶
<article>在此處插入第一篇文章</article>
<article>在此處插入第二篇文章</article>
SYSTEM
You will be provided with a thesis abstract and a suggested title for it. The thesis title should give the reader a good idea of the topic of the thesis but should also be eye-catching. If the title does not meet these criteria, suggest 5 alternatives.
USER
Abstract: insert abstract here
Title: insert title here
系統(tǒng)
將為您提供一篇論文摘要和一個(gè)建議的標(biāo)題。論文標(biāo)題應(yīng)該給讀者一個(gè)關(guān)于論文主題的好主意,同時(shí)也應(yīng)該引人注目。如果標(biāo)題不符合這些標(biāo)準(zhǔn),請(qǐng)?zhí)峁?個(gè)備選項(xiàng)。
用戶
摘要:在此處插入摘要
標(biāo)題:在此處插入標(biāo)題
For straightforward tasks such as these, using delimiters might not make a difference in the output quality. However, the more complex a task is the more important it is to disambiguate task details. Don’t make GPTs work to understand exactly what you are asking of them.
對(duì)于像這樣的簡單任務(wù),使用分隔符可能不會(huì)對(duì)輸出質(zhì)量產(chǎn)生影響。然而,任務(wù)越復(fù)雜,將任務(wù)細(xì)節(jié)明確化就越重要。不要讓 GPT 難以理解您確切的要求。
1.4、戰(zhàn)術(shù):指定完成任務(wù)所需的步驟Tactic: Specify the steps required to complete a task
Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.
有些任務(wù)最好被指定為一系列步驟。明確寫出這些步驟可以使模型更容易地遵循它們。
SYSTEM
Use the following step-by-step instructions to respond to user inputs.
Step 1 - The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says "Summary: ".
Step 2 - Translate the summary from Step 1 into Spanish, with a prefix that says "Translation: ".
USER
"""insert text here"""
系統(tǒng)
使用以下逐步說明來回應(yīng)用戶輸入。
第一步 - 用戶將向您提供由三重引號(hào)括起來的文本。用一個(gè)前綴為"Summary: "的句子總結(jié)這段文本。
第二步 - 將第一步中的摘要翻譯成西班牙語,并用一個(gè)前綴為"Translation: "的句子進(jìn)行說明。
用戶
"""在此處插入文本"""
1.5、戰(zhàn)術(shù):提供示例Tactic: Provide examples
Providing general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier. For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly. This is known as "few-shot" prompting.
提供適用于所有示例的通用指令通常比通過示例演示任務(wù)的所有排列組合更有效,但在某些情況下,提供示例可能更容易。例如,如果您打算讓模型復(fù)制一種難以明確描述的特定樣式來回應(yīng)用戶查詢,這被稱為"few-shot"提示。
SYSTEM
Answer in a consistent style.
USER
Teach me about patience.
ASSISTANT
The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread.
USER
Teach me about the ocean.
系統(tǒng)
以一種一致的風(fēng)格回答。
用戶
教我耐心。
助手
雕刻最深谷的河流源于一處謙遜的泉眼;最壯麗的交響樂由一聲孤獨(dú)的音符奏響;最精妙的壁毯始于一根孤寂的線頭。
用戶
教我關(guān)于海洋。
1.6、戰(zhàn)術(shù):指定輸出的期望長度Tactic: Specify the desired length of the output
You can ask the model to produce outputs that are of a given target length. The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision. The model can more reliably generate outputs with a specific number of paragraphs or bullet points.
您可以要求模型生成具有給定目標(biāo)長度的輸出。目標(biāo)輸出長度可以以詞數(shù)、句子數(shù)、段落數(shù)、項(xiàng)目符號(hào)等方式指定。然而,請(qǐng)注意,指示模型生成特定數(shù)量的單詞不具有高精確性。模型可以更可靠地生成具有特定段落數(shù)或項(xiàng)目符號(hào)數(shù)的輸出。
USER
Summarize the text delimited by triple quotes in about 50 words.
"""insert text here"""
用戶
用大約50個(gè)詞總結(jié)由三重引號(hào)分隔的文本。
"""在此處插入文本"""
USER
Summarize the text delimited by triple quotes in 2 paragraphs.
"""insert text here"""
用戶
用2個(gè)段落總結(jié)由三重引號(hào)分隔的文本。
"""在此處插入文本"""
USER
Summarize the text delimited by triple quotes in 3 bullet points.
"""insert text here"""
用戶
用3個(gè)項(xiàng)目符號(hào)總結(jié)由三重引號(hào)分隔的文本。
"""在此處插入文本"""
2、戰(zhàn)略:提供參考文本Strategy: Provide reference text
2.1、戰(zhàn)術(shù):指示模型使用參考文本回答問題Tactic: Instruct the model to answer using a reference text
If we can provide a model with trusted information that is relevant to the current query, then we can instruct the model to use the provided information to compose its answer.
如果我們能為模型提供與當(dāng)前查詢相關(guān)的可信信息,那么我們可以指示模型使用提供的信息來組織回答。
SYSTEM
Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer."
USER
<insert articles, each delimited by triple quotes>
Question: <insert question here>
系統(tǒng)
使用由三重引號(hào)分隔的提供的文章來回答問題。如果在文章中找不到答案,則寫下"I could not find an answer."。
用戶
<插入文章,每篇文章用三重引號(hào)分隔>
問題:<在此處插入問題>
Given that GPTs have limited context windows, in order to apply this tactic we need some way to dynamically lookup information that is relevant to the question being asked. Embeddings can be used to implement efficient knowledge retrieval. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval" for more details on how to implement this.
考慮到 GPTs 的有限上下文窗口,在應(yīng)用這一戰(zhàn)術(shù)時(shí),我們需要一種動(dòng)態(tài)查找與所提問題相關(guān)的信息的方法。嵌入可以用于實(shí)現(xiàn)高效的知識(shí)檢索。有關(guān)如何實(shí)施這一點(diǎn)的詳細(xì)信息,請(qǐng)參閱戰(zhàn)術(shù)"使用基于嵌入的搜索實(shí)現(xiàn)高效的知識(shí)檢索"。
2.2、戰(zhàn)術(shù):指示模型使用參考文本中的引用來回答問題Tactic: Instruct the model to answer with citations from a reference text
If the input has been supplemented with relevant knowledge, it's straightforward to request that the model add citations to its answers by referencing passages from provided documents. Note that citations in the output can then be verified programmatically by string matching within the provided documents.
如果輸入已經(jīng)補(bǔ)充了相關(guān)知識(shí),那么可以直接要求模型通過引用所提供文檔中的段落來為其答案添加引用。請(qǐng)注意,輸出中的引用可以通過在提供的文檔中進(jìn)行字符串匹配來進(jìn)行程序驗(yàn)證。
SYSTEM
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({"citation": …}).
USER
"""<insert document here>"""
Question: <insert question here>
系統(tǒng)
您將獲得一個(gè)由三重引號(hào)分隔的文檔和一個(gè)問題。您的任務(wù)是僅使用所提供的文檔回答問題,并引用回答問題所使用的文檔段落。如果文檔中不包含回答該問題所需的信息,則簡單寫下:"Insufficient information." 如果提供了問題的答案,則必須注明引用。請(qǐng)使用以下格式來引用相關(guān)段落({"citation": …})。
用戶
"""<在此處插入文檔>"""
問題:<在此處插入問題>
3、戰(zhàn)略:將復(fù)雜任務(wù)拆分為更簡單的子任務(wù)Strategy: Split complex tasks into simpler subtasks
3.1、戰(zhàn)術(shù):使用意圖分類來識(shí)別用戶查詢的最相關(guān)指令Tactic: Use intent classification to identify the most relevant instructions for a user query
For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed. This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages. The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task. This can also result in lower costs since larger prompts cost more to run (see pricing information).
對(duì)于需要處理不同情況下的許多獨(dú)立指令集的任務(wù),首先對(duì)查詢的類型進(jìn)行分類并使用該分類來確定所需的指令可能是有益的。這可以通過定義固定的類別并硬編碼與處理給定類別任務(wù)相關(guān)的指令來實(shí)現(xiàn)。該過程也可以遞歸地應(yīng)用于將任務(wù)分解為一系列階段。這種方法的優(yōu)點(diǎn)是,每個(gè)查詢將僅包含執(zhí)行任務(wù)的下一個(gè)階段所需的指令,這可能導(dǎo)致錯(cuò)誤率較低,與使用單個(gè)查詢執(zhí)行整個(gè)任務(wù)相比。這也可能導(dǎo)致更低的成本,因?yàn)檩^大的提示會(huì)增加運(yùn)行成本(請(qǐng)參閱定價(jià)信息)。
Suppose for example that for a customer service application, queries could be usefully classified as follows:
SYSTEM
You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary.
Primary categories: Billing, Technical Support, Account Management, or General Inquiry.
Billing secondary categories:
- Unsubscribe or upgrade
- Add a payment method
- Explanation for charge
- Dispute a charge
Technical Support secondary categories:
- Troubleshooting
- Device compatibility
- Software updates
Account Management secondary categories:
- Password reset
- Update personal information
- Close account
- Account security
General Inquiry secondary categories:
- Product information
- Pricing
- Feedback
- Speak to a human
USER
I need to get my internet working again.
例如,假設(shè)對(duì)于客戶服務(wù)應(yīng)用程序,查詢可以有以下有用的分類:
系統(tǒng)
您將獲得客戶服務(wù)查詢。將每個(gè)查詢分類為主要類別和次要類別。以json格式提供輸出,使用鍵:primary 和 secondary。
主要類別:計(jì)費(fèi)(Billing)、技術(shù)支持(Technical Support)、賬戶管理(Account Management)或一般查詢(General Inquiry)。
計(jì)費(fèi)的次要類別:
退訂或升級(jí)
添加付款方式
解釋費(fèi)用
糾紛費(fèi)用
技術(shù)支持的次要類別:
故障排除
設(shè)備兼容性
軟件更新
賬戶管理的次要類別:
密碼重置
更新個(gè)人信息
關(guān)閉賬戶
賬戶安全
一般查詢的次要類別:
產(chǎn)品信息
價(jià)格
反饋
聯(lián)系客服人員
Based on the classification of the customer query, a set of more specific instructions can be provided to a GPT model to handle next steps. For example, suppose the customer requires help with "troubleshooting".
用戶
我需要讓我的互聯(lián)網(wǎng)再次工作。
根據(jù)客戶查詢的分類,可以向 GPT 模型提供一組更具體的指令來處理下一步。例如,假設(shè)客戶需要幫助進(jìn)行"故障排除"。
SYSTEM
You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by:
- Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time.
- If all cables are connected and the issue persists, ask them which router model they are using
- Now you will advise them how to restart their device:
-- If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection.
-- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection.
- If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}.
- If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme:
<insert primary/secondary classification scheme from above here>
USER
I need to get my internet working again.
系統(tǒng)
您將獲得需要在技術(shù)支持環(huán)境中進(jìn)行故障排除的客戶服務(wù)查詢。通過以下方式幫助用戶:
要求他們檢查路由器與所有連接的電纜是否連接正常。請(qǐng)注意,隨著時(shí)間的推移,電纜松動(dòng)是常見的情況。
如果所有電纜連接正常且問題仍然存在,請(qǐng)?jiān)儐査麄冋谑褂媚膫€(gè)路由器型號(hào)。
現(xiàn)在,您將告訴他們?nèi)绾沃匦聠?dòng)設(shè)備:
-- 如果型號(hào)為 MTD-327J,請(qǐng)建議他們按下紅色按鈕并保持5秒鐘,然后在測(cè)試連接之前等待5分鐘。
-- 如果型號(hào)為 MTD-327S,請(qǐng)建議他們拔下電源并重新插入,然后在測(cè)試連接之前等待5分鐘。
如果客戶在重啟設(shè)備并等待5分鐘后問題仍然存在,請(qǐng)通過輸出{"IT support requested"}將其連接到 IT 支持。
如果用戶開始提出與此主題無關(guān)的問題,請(qǐng)確認(rèn)他們是否希望結(jié)束當(dāng)前有關(guān)故障排除的聊天,并根據(jù)上述方案對(duì)其請(qǐng)求進(jìn)行分類。
<從上面插入主/次分類方案>
用戶
我需要讓我的互聯(lián)網(wǎng)再次工作。
Notice that the model has been instructed to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected. By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience that would be hard to achieve with a less structured approach.
請(qǐng)注意,已指示模型發(fā)出特殊字符串以指示對(duì)話狀態(tài)何時(shí)改變。這使我們能夠?qū)⑾到y(tǒng)轉(zhuǎn)變?yōu)闋顟B(tài)機(jī),其中狀態(tài)確定要注入的指令。通過跟蹤狀態(tài)、該狀態(tài)下相關(guān)的指令以及可選的從該狀態(tài)允許的狀態(tài)轉(zhuǎn)換,我們可以在用戶體驗(yàn)周圍設(shè)置防護(hù)欄,這在非結(jié)構(gòu)化方法中很難實(shí)現(xiàn)。
3.2、戰(zhàn)術(shù):對(duì)于需要非常長對(duì)話的對(duì)話應(yīng)用,對(duì)先前對(duì)話進(jìn)行摘要或過濾Tactic: For dialogue applications that require very long conversations, summarize or filter previous dialogue
Since GPTs have a fixed context length, dialogue between a user and an assistant in which the entire conversation is included in the context window cannot continue indefinitely.
There are various workarounds to this problem, one of which is to summarize previous turns in the conversation. Once the size of the input reaches a predetermined threshold length, this could trigger a query that summarizes part of the conversation and the summary of the prior conversation could be included as part of the system message. Alternatively, prior conversation could be summarized asynchronously in the background throughout the entire conversation.
由于 GPT 模型有固定的上下文長度,用戶和助手之間的對(duì)話如果整個(gè)對(duì)話都包含在上下文窗口中,就無法無限進(jìn)行下去。
有多種解決這個(gè)問題的方法,其中之一是對(duì)先前的對(duì)話進(jìn)行摘要。一旦輸入的大小達(dá)到預(yù)定的閾值長度,就可以觸發(fā)一個(gè)查詢來摘要對(duì)話的一部分,并將先前對(duì)話的摘要作為系統(tǒng)消息的一部分包含進(jìn)來?;蛘?#xff0c;可以在整個(gè)對(duì)話過程中異步地對(duì)先前的對(duì)話進(jìn)行摘要。
An alternative solution is to dynamically select previous parts of the conversation that are most relevant to the current query. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval".
另一種解決方案是動(dòng)態(tài)選擇與當(dāng)前查詢最相關(guān)的先前對(duì)話部分。參見戰(zhàn)術(shù)"使用基于嵌入的搜索來實(shí)現(xiàn)高效的知識(shí)檢索"。
3.3、戰(zhàn)術(shù):將長文檔逐部分進(jìn)行摘要,并遞歸構(gòu)建完整摘要Tactic: Summarize long documents piecewise and construct a full summary recursively
Since GPTs have a fixed context length, they cannot be used to summarize a text longer than the context length minus the length of the generated summary in a single query.
由于 GPT 模型有固定的上下文長度,無法在單個(gè)查詢中用于摘要長度超過上下文長度減去生成摘要長度的文本。
To summarize a very long document such as a book we can use a sequence of queries to summarize each section of the document. Section summaries can be concatenated and summarized producing summaries of summaries. This process can proceed recursively until an entire document is summarized. If it’s necessary to use information about earlier sections in order to make sense of later sections, then a further trick that can be useful is to include a running summary of the text that precedes any given point in the book while summarizing content at that point. The effectiveness of this procedure for summarizing books has been studied in previous research by OpenAI using variants of GPT-3.
要摘要非常長的文檔(如書籍),可以使用一系列查詢來摘要文檔的每個(gè)部分。將部分摘要連接并進(jìn)行摘要,從而產(chǎn)生摘要的摘要。此過程可以遞歸進(jìn)行,直到對(duì)整個(gè)文檔進(jìn)行摘要。如果需要使用早期部分的信息來理解后續(xù)部分,那么在摘要后續(xù)內(nèi)容時(shí),包含書中給定點(diǎn)之前文本的運(yùn)行摘要可以是一個(gè)有用的技巧。OpenAI 以前使用 GPT-3 的變體進(jìn)行研究,研究了使用此方法摘要書籍的有效性。
4、戰(zhàn)略:給予 GPT 模型"思考"的時(shí)間Strategy: Give GPTs time to "think"
4.1、戰(zhàn)術(shù):指示模型在得出結(jié)論之前從基本原理開始推理Tactic: Instruct the model to work out its own solution before rushing to a conclusion
Sometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion. Suppose for example we want a model to evaluate a student’s solution to a math problem. The most obvious way to approach this is to simply ask the model if the student's solution is correct or not.
有時(shí),我們明確指示模型在得出結(jié)論之前從基本原理開始推理,會(huì)獲得更好的結(jié)果。例如,假設(shè)我們希望模型評(píng)估學(xué)生對(duì)數(shù)學(xué)問題的解答。最明顯的方法是簡單地問模型學(xué)生的解答是否正確。
SYSTEM
Determine if the student's solution is correct or not.
USER
Problem Statement: I'm building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.
Student's Solution: Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT
The student's solution is correct.
系統(tǒng)
判斷學(xué)生的解答是否正確。
用戶
問題描述:我正在建設(shè)一個(gè)太陽能發(fā)電裝置,需要幫助計(jì)算財(cái)務(wù)情況。
土地每平方英尺 100 美元
我可以以每平方英尺 250 美元的價(jià)格購買太陽能電池板
我談判了一份維護(hù)合同,每年固定費(fèi)用為 10 萬美元,再加上每平方英尺 10 美元
第一年的運(yùn)營總成本是作為平方英尺數(shù)的函數(shù)是多少?
學(xué)生的解答:設(shè) x 為裝置的面積(平方英尺)。
土地成本:100x
太陽能電池板成本:250x
維護(hù)成本:100,000 + 100x
總成本:100x + 250x + 100,000 + 100x = 450x + 100,000
助手
學(xué)生的解答是正確的。
But the student's solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.
但實(shí)際上,學(xué)生的解答是不正確的!我們可以讓模型通過提示模型先生成自己的解答來成功注意到這一點(diǎn)。
SYSTEM
First work out your own solution to the problem. Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. Don't decide if the student's solution is correct until you have done the problem yourself.
USER
Problem Statement: I'm building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.
Student's Solution: Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT
Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 10x
Total cost: 100x + 250x + 100,000 + 10x =360x + 100,000
The student's solution is incorrect. They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is360x + 100,000.
系統(tǒng)
首先自己解答這個(gè)問題。然后將你的解答與學(xué)生的解答進(jìn)行比較,并評(píng)估學(xué)生的解答是否正確。在自己完成問題之前,請(qǐng)不要確定學(xué)生的解答是否正確。
用戶
問題描述:我正在建設(shè)一個(gè)太陽能發(fā)電裝置,需要幫助計(jì)算財(cái)務(wù)情況。
土地每平方英尺 100 美元
我可以以每平方英尺 250 美元的價(jià)格購買太陽能電池板
我談判了一份維護(hù)合同,每年固定費(fèi)用為 10 萬美元,再加上每平方英尺 10 美元
第一年的運(yùn)營總成本是作為平方英尺數(shù)的函數(shù)是多少?
學(xué)生的解答:設(shè) x 為裝置的面積(平方英尺)。
土地成本:100x
太陽能電池板成本:250x
維護(hù)成本:100,000 + 100x
總成本:100x + 250x + 100,000 + 100x = 450x + 100,000
助手
設(shè) x 為裝置的面積(平方英尺)。
土地成本:100x
太陽能電池板成本:250x
維護(hù)成本:100,000 + 10x
總成本:100x + 250x + 100,000 + 10x =360x + 100,000
學(xué)生的解答是不正確的。他們?cè)谟?jì)算維護(hù)成本時(shí)犯了錯(cuò)誤,使用了 100x 而不是 10x。第一年的運(yùn)營總成本的正確公式是360x + 100,000。
4.2、戰(zhàn)術(shù):使用內(nèi)心獨(dú)白或一系列查詢來隱藏模型的推理過程Tactic: Use inner monologue or a sequence of queries to hide the model's reasoning process
The previous tactic demonstrates that it is sometimes important for the model to reason in detail about a problem before answering a specific question. For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user. For example, in tutoring applications we may want to encourage students to work out their own answers, but a model’s reasoning process about the student’s solution could reveal the answer to the student.
前面的戰(zhàn)術(shù)表明,對(duì)于模型在回答特定問題之前對(duì)問題進(jìn)行詳細(xì)推理有時(shí)很重要。對(duì)于某些應(yīng)用程序來說,模型用于得出最終答案的推理過程可能不適合與用戶分享。例如,在輔導(dǎo)應(yīng)用程序中,我們可能希望鼓勵(lì)學(xué)生自己解決問題,但模型關(guān)于學(xué)生解決方案的推理過程可能會(huì)向?qū)W生透露答案。
Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy. Then before presenting the output to the user, the output is parsed and only part of the output is made visible.
內(nèi)心獨(dú)白是一種可以用來減輕這種情況的戰(zhàn)術(shù)。內(nèi)心獨(dú)白的思路是指示模型將要向用戶隱藏的部分輸出放入一個(gè)結(jié)構(gòu)化格式中,以便于解析。然后,在將輸出呈現(xiàn)給用戶之前,對(duì)輸出進(jìn)行解析,僅將部分輸出顯示給用戶。
SYSTEM
Follow these steps to answer the user queries.
Step 1 - First work out your own solution to the problem. Don't rely on the student's solution since it may be incorrect. Enclose all your work for this step within triple quotes (""").
Step 2 - Compare your solution to the student's solution and evaluate if the student's solution is correct or not. Enclose all your work for this step within triple quotes (""").
Step 3 - If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes (""").
Step 4 - If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing "Step 4 - ..." write "Hint:".
USER
Problem Statement: <insert problem statement>
Student Solution: <insert student solution>
系統(tǒng)
按照以下步驟回答用戶的查詢。
步驟 1 - 首先獨(dú)立解決問題。不要依賴學(xué)生的解答,因?yàn)樗赡苁遣徽_的。在這一步中,使用三引號(hào)(""")將所有工作括起來。
步驟 2 - 將你的解答與學(xué)生的解答進(jìn)行比較,并評(píng)估學(xué)生的解答是否正確。在這一步中,使用三引號(hào)(""")將所有工作括起來。
步驟 3 - 如果學(xué)生犯了錯(cuò)誤,確定你可以給予學(xué)生什么提示,而不透露答案。在這一步中,使用三引號(hào)(""")將所有工作括起來。
步驟 4 - 如果學(xué)生犯了錯(cuò)誤,向?qū)W生提供上一步的提示(在三引號(hào)之外)。而不是寫"步驟 4 - ...",寫"提示:"。
用戶
問題描述: <插入問題描述>
學(xué)生的解答: <插入學(xué)生的解答>
Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.
First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the student’s solution, it can be omitted. This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.
另外,可以通過一系列查詢來實(shí)現(xiàn),其中除最后一個(gè)之外的所有查詢的輸出對(duì)最終用戶是隱藏的。
首先,我們可以要求模型自行解決問題。由于此初始查詢不需要學(xué)生的解答,可以省略該部分。這還具有額外的優(yōu)勢(shì),即模型的解決方案不會(huì)受到學(xué)生嘗試解答的影響。
USER
<insert problem statement>
用戶
<插入問題描述>
Next, we can have the model use all available information to assess the correctness of the student’s solution.
SYSTEM
Compare your solution to the student's solution and evaluate if the student's solution is correct or not.
USER
Problem statement: """<insert problem statement>"""
Your solution: """<insert model generated solution>"""
Student’s solution: """<insert student's solution>"""
接下來,我們可以讓模型利用所有可用信息來評(píng)估學(xué)生的解答的正確性。
系統(tǒng)
將你的解答與學(xué)生的解答進(jìn)行比較,并評(píng)估學(xué)生的解答是否正確。
用戶
問題描述:"""<插入問題描述>"""
你的解答:"""<插入模型生成的解答>"""
學(xué)生的解答:"""<插入學(xué)生的解答>"""
Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
SYSTEM
You are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment.
USER
Problem statement: """<insert problem statement>"""
Your solution: """<insert model generated solution>"""
Student’s solution: """<insert student's solution>"""
Analysis: """<insert model generated analysis from previous step>"""
最后,我們可以讓模型使用自己的分析,以友好輔導(dǎo)員的角色構(gòu)建回復(fù)。
系統(tǒng)
你是一位數(shù)學(xué)輔導(dǎo)員。如果學(xué)生犯了錯(cuò)誤,以不透露答案的方式向?qū)W生提供提示。如果學(xué)生沒有犯錯(cuò)誤,只需給他們一個(gè)鼓勵(lì)的評(píng)論。
用戶
問題描述:"""<插入問題描述>"""
你的解答:"""<插入模型生成的解答>"""
學(xué)生的解答:"""<插入學(xué)生的解答>"""
分析:"""<插入前一步模型生成的分析>"""
4.3、戰(zhàn)術(shù):詢問模型在之前的步驟中是否遺漏了任何內(nèi)容Tactic: Ask the model if it missed anything on previous passes
Suppose that we are using a model to list excerpts from a source which are relevant to a particular question. After listing each excerpt the model needs to determine if it should start writing another or if it should stop. If the source document is large, it is common for a model to stop too early and fail to list all relevant excerpts. In that case, better performance can often be obtained by prompting the model with followup queries to find any excerpts it missed on previous passes.
假設(shè)我們正在使用一個(gè)模型列出與特定問題相關(guān)的源文件摘錄。在列出每個(gè)摘錄后,模型需要確定是繼續(xù)寫另一個(gè)摘錄還是停止。如果源文件很大,模型往往會(huì)過早停止,沒有列出所有相關(guān)的摘錄。在這種情況下,通過提示模型使用后續(xù)查詢來查找之前遺漏的任何摘錄,可以獲得更好的性能。
SYSTEM
You will be provided with a document delimited by triple quotes. Your task is to select excerpts which pertain to the following question: "What significant paradigm shifts have occurred in the history of artificial intelligence."
Ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context. Provide output in JSON format as follows:
[{"excerpt": "..."},
...
{"excerpt": "..."}]
USER
"""<insert document here>"""
ASSISTANT
[{"excerpt": "the model writes an excerpt here"},
...
{"excerpt": "the model writes another excerpt here"}]
USER
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context.
系統(tǒng)
您將獲得一個(gè)由三引號(hào)分隔的文檔。您的任務(wù)是選擇與以下問題相關(guān)的摘錄:"人工智能的歷史中發(fā)生了哪些重大范式轉(zhuǎn)變。"
確保摘錄包含解釋它們所需的所有相關(guān)上下文 - 換句話說,不要提取缺少重要上下文的小片段。以以下 JSON 格式提供輸出:
[{"excerpt": "..."},
...
{"excerpt": "..."}]
用戶
"""<插入文檔>"""
助手
[{"excerpt": "模型在此處寫摘錄"},
...
{"excerpt": "模型在此處再寫一個(gè)摘錄"}]
用戶
是否還有更多相關(guān)的摘錄?請(qǐng)注意不要重復(fù)摘錄。還要確保摘錄包含解釋它們所需的所有相關(guān)上下文 - 換句話說,不要提取缺少重要上下文的小片段。
5、策略:使用外部工具Strategy: Use external tools
5.1、戰(zhàn)術(shù):使用基于嵌入的搜索來實(shí)現(xiàn)高效的知識(shí)檢索Tactic: Use embeddings-based search to implement efficient knowledge retrieval
A model can leverage external sources of information if provided as part of its input. This can help the model to generate more informed and up-to-date responses. For example, if a user asks a question about a specific movie, it may be useful to add high quality information about the movie (e.g. actors, director, etc…) to the model’s input. Embeddings can be used to implement efficient knowledge retrieval, so that relevant information can be added to the model input dynamically at run-time.
如果模型可以獲得外部信息作為其輸入的一部分,那么它可以利用外部信息來生成更具信息和更新的回答。例如,如果用戶詢問有關(guān)特定電影的問題,將有關(guān)該電影的高質(zhì)量信息(如演員、導(dǎo)演等)添加到模型的輸入中可能很有用。嵌入可以用于實(shí)現(xiàn)高效的知識(shí)檢索,以便在運(yùn)行時(shí)動(dòng)態(tài)地將相關(guān)信息添加到模型輸入中。
A text embedding is a vector that can measure the relatedness between text strings. Similar or relevant strings will be closer together than unrelated strings. This fact, along with the existence of fast vector search algorithms means that embeddings can be used to implement efficient knowledge retrieval. In particular, a text corpus can be split up into chunks, and each chunk can be embedded and stored. Then a given query can be embedded and vector search can be performed to find the embedded chunks of text from the corpus that are most related to the query (i.e. closest together in the embedding space).
Example implementations can be found in the OpenAI Cookbook. See the tactic “Instruct the model to use retrieved knowledge to answer queries” for an example of how to use knowledge retrieval to minimize the likelihood that a model will make up incorrect facts.
文本嵌入是一種可以衡量文本字符串之間相關(guān)性的向量。相似或相關(guān)的字符串將比不相關(guān)的字符串更接近。由此可知,結(jié)合快速向量搜索算法的存在,可以使用嵌入來實(shí)現(xiàn)高效的知識(shí)檢索。特別是,文本語料庫可以被分成多個(gè)塊,每個(gè)塊可以被嵌入和存儲(chǔ)。然后,可以將給定的查詢進(jìn)行嵌入,執(zhí)行向量搜索,以找到與查詢最相關(guān)的語料庫中嵌入的文本塊(即在嵌入空間中最接近的文本塊)。
在 OpenAI Cookbook 中可以找到示例實(shí)現(xiàn)。參見戰(zhàn)術(shù)“指示模型使用檢索到的知識(shí)來回答查詢”的示例,了解如何使用知識(shí)檢索來減少模型捏造錯(cuò)誤事實(shí)的可能性。
5.2、戰(zhàn)術(shù):使用代碼執(zhí)行進(jìn)行更準(zhǔn)確的計(jì)算或調(diào)用外部 APITactic: Use code execution to perform more accurate calculations or call external APIs
GPTs cannot be relied upon to perform arithmetic or long calculations accurately on their own. In cases where this is needed, a model can be instructed to write and run code instead of making its own calculations. In particular, a model can be instructed to put code that is meant to be run into a designated format such as triple backtics. After an output is produced, the code can be extracted and run. Finally, if necessary, the output from the code execution engine (i.e. Python interpreter) can be provided as an input to the model for the next query.
GPT 模型不能單獨(dú)依賴于進(jìn)行精確的算術(shù)運(yùn)算或長時(shí)間的計(jì)算。在需要這樣做的情況下,可以指示模型編寫和運(yùn)行代碼,而不是進(jìn)行自己的計(jì)算。特別是,可以指示模型將要運(yùn)行的代碼放入指定的格式(如三個(gè)反引號(hào))。在產(chǎn)生輸出之后,可以提取并運(yùn)行代碼。最后,如果需要,可以將代碼執(zhí)行引擎(如 Python 解釋器)的輸出作為下一個(gè)查詢的模型輸入。
SYSTEM
You can write and execute Python code by enclosing it in triple backticks, e.g. ```code goes here```. Use this to perform calculations.
USER
Find all real-valued roots of the following polynomial: 3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10.
系統(tǒng)
您可以使用三個(gè)反引號(hào)括起來的方式編寫和執(zhí)行 Python 代碼,例如code goes here。使用它來進(jìn)行計(jì)算。
用戶
找到以下多項(xiàng)式的所有實(shí)根:3x**5 - 5x4 - 3*x3 - 7*x - 10。
Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.
使用代碼執(zhí)行的另一個(gè)很好的用例是調(diào)用外部 API。如果正確使用 API 的方式對(duì)模型進(jìn)行了指示,它可以編寫使用該 API 的代碼??梢酝ㄟ^提供文檔和/或展示如何使用 API 的代碼示例來指示模型如何使用 API。
SYSTEM
You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:
```python
import message
message.write(to="John", message="Hey, want to meetup after work?")```
系統(tǒng)
您可以使用三個(gè)反引號(hào)括起來的方式編寫和執(zhí)行 Python 代碼。此外,請(qǐng)注意,您可以訪問以下模塊,以幫助用戶向朋友發(fā)送消息:
Python
import message
message.write(to="John", message="Hey, want to meetup after work?")```
WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.
警告:執(zhí)行模型生成的代碼在本質(zhì)上并不安全,因此在任何試圖這樣做的應(yīng)用程序中應(yīng)采取預(yù)防措施。特別是,需要使用沙箱式代碼執(zhí)行環(huán)境來限制不受信任的代碼可能造成的損害。
6、策略:系統(tǒng)化地測(cè)試更改Strategy: Test changes systematically
Sometimes it can be hard to tell whether a change — e.g., a new instruction or a new design — makes your system better or worse. Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.
有時(shí)候很難判斷一項(xiàng)改變(如新指令或新設(shè)計(jì))是否會(huì)使系統(tǒng)變得更好或更差。查看幾個(gè)示例可能會(huì)暗示哪個(gè)更好,但是對(duì)于樣本數(shù)量較小的情況下,很難區(qū)分是真正的改進(jìn)還是偶然的運(yùn)氣。也許該改變?cè)谀承┹斎肷嫌兄谛阅?#xff0c;但對(duì)其他輸入有害。
Evaluation procedures (or "evals") are useful for optimizing system designs. Good evals are:
Representative of real-world usage (or at least diverse)
Contain many test cases for greater statistical power (see table below for guidelines)
Easy to automate or repeat
評(píng)估過程(或“評(píng)估”)對(duì)于優(yōu)化系統(tǒng)設(shè)計(jì)是有用的。良好的評(píng)估應(yīng)具備以下特點(diǎn):
- 代表實(shí)際使用情況(或至少具備多樣性);
- 包含許多測(cè)試案例以提高統(tǒng)計(jì)功效(有關(guān)準(zhǔn)則的指導(dǎo),請(qǐng)參見下表);
- 易于自動(dòng)化或重復(fù)進(jìn)行。
Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.
對(duì)輸出進(jìn)行評(píng)估可以由計(jì)算機(jī)、人類或兩者混合進(jìn)行。計(jì)算機(jī)可以通過客觀標(biāo)準(zhǔn)自動(dòng)化評(píng)估,例如具有單一正確答案的問題,以及一些主觀或模糊的標(biāo)準(zhǔn),在這些標(biāo)準(zhǔn)中,模型的輸出由其他模型查詢進(jìn)行評(píng)估。OpenAI Evals 是一個(gè)開源軟件框架,提供了創(chuàng)建自動(dòng)化評(píng)估工具的工具。
Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers). The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable. We encourage experimentation to figure out how well model-based evals can work for your use case.
當(dāng)存在一系列可能被視為質(zhì)量相等的輸出時(shí)(例如對(duì)于具有長答案的問題),基于模型的評(píng)估可以非常有用。在基于模型的評(píng)估和需要人類評(píng)估之間,可以進(jìn)行模糊的界限,隨著模型的能力增強(qiáng),這個(gè)界限也在不斷變化。我們鼓勵(lì)進(jìn)行實(shí)驗(yàn),以了解基于模型的評(píng)估在您的使用情況下的可行性如何。
6.1、戰(zhàn)術(shù):根據(jù)黃金標(biāo)準(zhǔn)答案評(píng)估模型輸出Tactic: Evaluate model outputs with reference to gold-standard answers
Suppose it is known that the correct answer to a question should make reference to a specific set of known facts. Then we can use a model query to count how many of the required facts are included in the answer.
For example, using the following system message:
假設(shè)已知問題的正確答案應(yīng)涉及特定的已知事實(shí)集。然后,我們可以使用模型查詢來計(jì)算答案中包含了多少必需的事實(shí)。
例如,使用以下系統(tǒng)消息:
SYSTEM
You will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer:
- Neil Armstrong was the first person to walk on the moon.
- The date Neil Armstrong first walked on the moon was July 21, 1969.
For each of these points perform the following steps:
1 - Restate the point.
2 - Provide a citation from the answer which is closest to this point.
3 - Consider if someone reading the citation who doesn't know the topic could directly infer the point. Explain why or why not before making up your mind.
4 - Write "yes" if the answer to 3 was yes, otherwise write "no".
Finally, provide a count of how many "yes" answers there are. Provide this count as {"count": <insert count here>}.
系統(tǒng)
您將獲得由三個(gè)引號(hào)分隔的文本,這應(yīng)該是一個(gè)問題的答案。檢查以下信息是否直接包含在答案中:
尼爾·阿姆斯特朗是第一個(gè)登上月球的人。
尼爾·阿姆斯特朗首次登上月球的日期是1969年7月21日。
針對(duì)每個(gè)觀點(diǎn)執(zhí)行以下步驟:
1 - 重述觀點(diǎn)。
2 - 提供與此觀點(diǎn)最接近的答案中的引文。
3 - 考慮如果一個(gè)不了解該主題的人閱讀引文是否能夠直接推斷出觀點(diǎn)。在下定決心之前解釋為什么或?yàn)槭裁床弧?div style="height:15px;">
"""Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission."""
"""Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon."""
Apollo 11, bold as legend's hand.
There are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.
這種基于模型的評(píng)估類型有很多可能的變體。考慮以下變體,它跟蹤候選答案與黃金標(biāo)準(zhǔn)答案之間的重疊類型,并跟蹤候選答案是否與黃金標(biāo)準(zhǔn)答案的任何部分相矛盾。
Use the following steps to respond to user inputs. Fully restate each step before proceeding. i.e. "Step 1: Reason...".
Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset/superset).
Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer.
Step 3: Output a JSON object structured like: {"type_of_overlap": "disjoint" or "equal" or "subset" or "superset" or "overlapping", "contradiction": true or false}
第3步:輸出一個(gè)結(jié)構(gòu)化的JSON對(duì)象,如:{"type_of_overlap": "不交集"或"相等"或"子集"或"超集"或"重疊","contradiction": true或false}
Here's an example input with a substandard answer which nonetheless does not contradict the expert answer:
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""
Submitted Answer: """On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
Here's an example input with a correct answer that also provides a bit more detail than is necessary:
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""
Submitted Answer: """At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
提交的答案:"""大約在UTC時(shí)間02:56于1969年7月21日,尼爾·阿姆斯特朗成為第一個(gè)踏上月球表面的人,標(biāo)志著人類歷史上的重大成就。"""
For more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as: