LLM prompt heuristics that definitely maybe work

2024-10-11 ⋅ 5 minute read

Effective prompt writing for large language models continues to be a dark art. Having read the prompt engineering blogs from Meta, Anthropic, and OpenAI and watched some of Anthropic’s prompt discussions online, it does feel more like design than engineering. Even employees from the same lab don’t agree what tricks actually work. If you have time watch this video of one of Antropic’s prompt engineers running a prompting masterclass. While giving good heuristics to the audience he is careful not to make any definite statements.

I don’t believe it is a good investment to try to become a world-class prompter. All heuristics are highly dependent on the model architecture, training data, and training procedure. This means that with every iteration of the models, heuristics could become obsolete or harmful to performance.

However, the current attention-based architecture doesn’t seem to go anywhere soon. Therefore, it is reasonable to expect that we can continue to use the prompt context to help the model to move the embeddings of the user task into directions that contain a lot of nuance and information about the domain of the task.

Therefore, I am collecting the advice into a few guidelines that I can use with the current models. I will caveat this by saying that in 90% of my use-cases the model response is parsed by a human: me. Therefore, I am not as worried about hallucinations as someone that puts the model outputs in front of their customers or inside data parsing pipelines.

Here is what seems to work in October 2024 with Claude Sonnet 3.5, GPT-4o, and Llama 3.2.

Prompt specificity

Make the prompt as specific to the task as you can. This is probably the biggest return on your time. A good advice given in this discussion is to imagine printing out the prompt and giving it to a new hire at your company. Then see if they could solve the task. This forces you to put give all necessary context and constraints of the task.
Components of a role prompt: The role, its part of an organisation, its perspective and the perspective of the person or organisation being addressed.

No lazy role prompts. Make the role and the context clear.

Bad: You are a cab driver.
Good: You are a cab driver, driving people as your full time job in London for twenty years and are knowledgble about the city, its roads, and its sights.

If the model is used inside a product, tell it about it:

Bad: You are an assistent writing document summaries.
Good: You are an assistent used in a product for law firms that summarizes
legal documents.

Order of prompt components

Order the components of your prompts in the following order:

Specific role and context description (put into system prompt if part of interactive use of the model)
Input data, e.g. documents, code snippets, CSV files
Task description
(Optional:) Good examples and bad examples
Task constraints

Use XML tags for prompt components

Use XML tags to separate different components of a prompt.

You are an experienced post-doc at a reputable research institute in the US. You are an expert in the research field of the following paper:

<Research_paper>
{{research_paper}}  
<Research_paper/>

Your task is to summarize the findings of the research paper given to you in
<Research_paper> tags.

<Instructions>
- List between 1 - 5 most important findings of the paper. Don't list more
  than 5 findings.
- more instructions...
</Instructions>

Using examples (few shot prompting)

It can help to provide both good and bad examples for a specific task with an explanation why they are good or bad. In the above example we could provide an example research paper with a summary of findings we wrote ourselves as a good example.

Instructions

Response length limit. Instead of writing “Be concise”, tell the model what that means for your context, e.g. “Answer in no more than 4 sentences.”
Avoid open ended instructions.
Instruct for style, formatting, and restrictions.
Instruct to ask for sources of evidence to reduce hallucinations.

Ask it to respond in a chain-of-thought to increase performance.

You are a logician and love to solve logic puzzles. Carefully read the following puzzle.

<PUZZLE>
Simon is looking at Charlie. Charlie is looking at Sarah. You know that Simon is married and Sarah is not married. Is a married person looking at an unmarried person?
</PUZZLE>
Let's think step by step before giving an answer.

Grammar and style

Avoid typos and wrong punctuation as that can deteriorate the quality of the response.

Parse-able output

Often LLMs add a preamble at the beginning or an epilogue at the end of a response. If you want to force the model to respond only with valid json, you can:

Use the “Prefill Claude’s response” feature
Ask the model to put the json into tags and then extract that block from the response
Prefill the response yourself by adding at the end of your prompt: Here is the JSON: {. The open bracket conditions the model to start the answer with the first JSON key. You then need to prepend the “{” to the response to make it valid.

Other tricks

If your prompt includes logic that could be handled in code, handle it in code.

Bad: You are part of a role-playing game that is used for training of customer support agents in a fortune 500 company. 

You can assume any of the following roles based on user input. 

If the user asks for Role1 assume the role of a customer asking questions about the company's products. If the user asks for Role2, assume the role of a helpful customer support agent.

Good: Define two different prompts for each role and use code to switch prompts based on what role the operator wants the model to assume.

Give the model a way out if it doesn’t know the answer.

Good: {{prompt}}
If something weird happens and you are unsure about what to do, simply print out
"UNSURE".

Use a temperature of 0 for fact-based, less creative tasks.

Iterative prompt design

You can use the LLM to generate examples for a task and select the good. examples. Then use those examples in the prompt that is used “in production”.
If the model responds incorrectly, tell it about the mistake and ask it how you should modify the prompt.

If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.

Tags:
#llm #prompt #claude #gpt #llama