return to table of content

Anthropic's Prompt Engineering Interactive Tutorial

someoneontenet
19 replies
1d19h

I talk to ai like a caveman mostly. Instead of over optimizing my prompt I just try to find the minimal amount of representation to get the llm to understand my problem and solve it for me and I have been very productive with this strategy. What would someone like me get out of prompt engineering? Or is it more for things like agents, bots, and assistants?

magicalhippo
9 replies
1d18h

I'd guess the latter.

My SO wanted Google Assistant at home after gotten used to it in our Android Automotive-based car. So I've been dabbling with local LLMs, as a learning experience.

I got one prompt which classifies the question, ie asking for weather, math question or knowledge etc. There I ask it to only output the category, so I can easily do different things based on that.

For knowledge-based stuff I include our town and country, tell it to use metric units and be brief. I tell it to ask clarifying questions if needed. If I don't it'll use miles, or both miles and km, and be too long-winded and assumes too much.

For calculations, I've been asking it to output Octave code that computes the answer, giving the result in a specific variable name, and without explanation. If it can't then output a special sequence. Without it'll include explanations of what the code does and not be consistent with variable naming.

Been using Gemma 9B so far, which performs well on my aging 2080Ti, and haven't actually put all the pieces together yet (my SO asked last weekend). But seems very promising, and adding the extra instructions for each task radically changes the output and makes this approach viable.

Btw, I know there are probably tons of these assistants out there. I just enjoy figuring out how things work.

lynx23
2 replies
1d12h

What hardware do you use for audio I/O. And what do you use for activation? I feel the rest is pretty simple to cobbler together... Should be a few days of work to surpass Siri...

magicalhippo
1 replies
1d10h

This is the tricky bit yes. I bought some Pi HATs[1] earlier, and they seem to work fine when testing, but haven't tried to use them properly yet. Alternatively I have some condenser microphones and an audio card HAT for the Pi.

So I hope to be able to use that with Pi Zero W. Played around with using PulseAudio so I could add compression plugins and such, which seems to work.

Haven't actually attempted wake-work yet, but first port of call there will be to try the ones Rhasspy uses[2].

Since I want to integrate with HA I might end up expanding on Rhasspy, or I might just use parts of Rhasspy, ie once I determine it's a HA command and not a question about dinosaurs say, inject the cleaned transcription into the Rhasspy pipeline.

[1]: https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html

[2]: https://rhasspy.readthedocs.io/en/latest/wake-word/

pseudosavant
0 replies
1d1h

I was just looking what it would take to have a local LLM assistant and ended up on ReSpeaker too.

I am just wrapping up another Pi Zero W audio recording project (https://github.com/pseudosavant/ps-audio-recorder). I used the first-gen Zero W since I had one already lying around. I'd strongly recommend using the newer Pi Zero 2 W though.

The Zero W CPU is just so slow. It can barely (e.g. 1.05x) do real-time MP3 encoding, and the single core makes it prone to being interrupted by background tasks. It couldn't even do SMB/Windows file sharing faster than 0.5MB/sec without disabling encryption in Samba. Encryption kills this CPU.

I have a couple of Zero 2s coming in the mail.

all2
2 replies
1d14h

What are you using for glue? Just python?

magicalhippo
1 replies
1d14h

Yeah, though for now it's just proof of concept stage.

Ey7NFZ3P0nzAe
0 replies
20h6m

I'm thinking you would like langgraph

Ey7NFZ3P0nzAe
1 replies
20h4m

Do you mind sharing your code at some point? Is it using home automation? How are you asking it anything? Is it via your phone or have you setup a speaker device or what?

magicalhippo
0 replies
6h50m

If I manage to make something useful I'll certainly do that.

Planning on using small Pi Zero 2 based "boxes" with microphone, see other reply. Though have only done cursory tests.

magicalhippo
0 replies
1d14h

Just for illustration, here's a concrete example. As noted I'm a total n00b and just playing around, but I think it's illustrative, including the limitations of these LLMs.

First I used a Norwegian-tuned Whisper large model to convert to convert some audio. The audio was intentionally unfocused, to simulate a real session. It outputs English so does the translation directly as well, hence the somewhat weird sentences and use of "cake shape" rather than "cake pan". The output from Whisper was as follows:

OK. Yes, think I have a cake shape. I have a recipe for a cake shape ... Wait a minute. So, I have a recipe meant for a cake shape of 30 cm. I will use a cake shape of 24 cm. How much less do I have to do the recipe to ... That it should fit.

For the following I use Gemma 9B as mentioned.

First I pass it through a cleanup step:

Input> The following statement is an audio transcript of a question or statement. It may contain errors, corrections and unrelated utterances. Rewrite it so that it is suitable for an email or text message. Remove superflous or unrelated parts. Do not extrapolate beyond what is in the transcript. The transcript is: [Whisper output]

Output> I have a cake recipe for a 30cm cake shape. I want to use a 24cm cake shape. How much less do I need to make the recipe so it fits?

Next I pass it through a classification step:

Input> Try to categorize the following question in the following categories: knowledge question, math calculation or equation, information about public transport, information about weather, unknown. Output only the category. The question is: [cleaned output]

Output> math calculation or equation

For math questions I pass it to the following prompt:

Input> Convert the following math question into octave/matlab syntax which calculates the answer if possible. The output variable should be x. Only output the code. If the question is unclear or it's not possible to calculate, output three exclamation points followed by a brief explanation. If you need additional information, include that in the explanation. The question is: [cleaned output]

Output> !!! The question requires calculating the volume ratio between two cake shapes, which is not possible with only the provided dimensions.

So we hit a limitation. For people it's obvious two cake pans would have similar height, and thus the answer is calculable. However we didn't tell it that it could make assumptions, so lets try that:

Input> Convert the following math question into octave/matlab syntax which calculates the answer if possible. The output variable should be x. Only output the code. You can make some conservative assumptions, but if the question is unclear or it's not possible to calculate, output three exclamation points followed by a brief explanation. If you need additional information, include that in the explanation. If you make some assumptions, include that as comments at the start of the code. The question is: [cleaned output]

Output>

    % Assuming the cake recipe volume is proportional to the cake shape volume
    r1 = 30/2; % Radius of the 30cm cake
    r2 = 24/2; % Radius of the 24cm cake
    h = 10; % Height of the cakes
    vol1 = pi \* r1^2 \* h; % Volume of the 30cm cake
    vol2 = pi \* r2^2 \* h; % Volume of the 24cm cake
    x = (vol1 - vol2) / vol1; % Ratio of reduction needed
So we got a satisfactory answer. However, here again you see limitations. It doesn't truly understand math. There's no need to assume an explicit height, as the heights cancel out in the final calculation.

Ok, let's try that last prompt on a different and ambiguous question:

What's the cutoff frequency for an LC circuit with R equals 500 ohm and C equals 10 nanofarrad?

Output>

    ```octave
    % Assuming L is not given, we cannot calculate the cutoff frequency.
    % The cutoff frequency formula requires the inductance (L) value.
    !!! Need the inductance (L) value to calculate the cutoff frequency.
    ```
First, it correctly determined the prompt was ambiguous to the point of not being answerable. However it also suddenly included the Markdown which I didn't tell it to do, and which it didn't in the previous case. So maybe the prompt needs to be even more explicit...

snapcaster
1 replies
1d18h

If you're trying to ship something to production that has consistent behavior within bounds and handles edge cases you'll need to do quite a bit of work. For ChatGPT use your strategy works fine

sandspar
0 replies
1d16h

E.g. the way a librarian Googles stuff vs how an average user does. Googling "highest average income USA state" is fine for an average user. A research librarian would need to be more sophisticated because they're being asked to solve more sophisticated problems.

bityard
1 replies
1d17h

I'm starting to use LLMs more and more for technical/programming questions, but my success rate so far is only about 50/50. I haven't tried a lot of them, but so far Copilot is my least favorite. When I ask it a technical question, it seems to give me answers to a question /sort of like/ but not exactly like what I was asking. Essentially treating every prompt like an X/Y problem. Which is much more frustrating than just telling me my question is rare enough that it doesn't know the answer.

simonw
0 replies
1d4h

Do you mean GitHub Copilot or Microsoft Copilot?

samstave
0 replies
1d17h

I do similar to you. I was commenting in another thread of similar:

From the visual codeViz thread ---

https://news.ycombinator.com/item?id=41393458

...

I've been wanting to have a GPT directly inside Blender to Talk Geometry Nodes - because I want to tie geometry nodes to external data to external data which runs as python inside blender that draws the object geometry that suitabley shows/diagrams out the nodes of my game I am slowly piecing together 'The Oligarchs' which is an updated Illuminati style game - but with updates using AI to creat nodes directly from Oligarch IRL files, such as their SEC Filings, Panama Papers, and all the tools on HN are suited to creating. I went to school for Softimage & Alias|WAVEFRONT (which became MAYA) Animation in 1995 :-)

So I like your DNA.

I want to unpack the relationships of the Oligarch, programmatically, with hexagonal nodes, similar to this[0]- but driven by Node-based-python-blocks-GraphQL-hierachy. And I am slowly learning how to get GPTBots to spit out the appropriate Elements for me to get there.

[0] - https://www.youtube.com/watch?v=vSr6yUBs8tY

(ive posted a bunch of disjointed information on this on HN - more specifically about how to compartmentalize GPT responses and code and how to drive them to write code using Style-Guide, and gather data using structures rules for how the outputs need to be presented..)

EDIT:

I throw commands at lit like this, where I tell it to "give me a ps1 that sets a fastAPI directory structure, creates the venv, touches the correct files give me a readme and follow the best practice for fastAPI from [this github repo from netflix]

And it gave me that script...

Then, here is the following when I want to document it. Then, Ill take that script and tell it to give me a webUI to run it and invoke it and add logging and dashboards.

I do this to practice making tooling logic doo-dads on the fly, and then iterate through them.

https://i.imgur.com/7YOjJf8.png

https://i.imgur.com/KecrvfZ.png

https://i.imgur.com/tKYsmb9.png

https://i.imgur.com/nCGOfSU.png

https://i.imgur.com/ayDrXZA.png

Etc -- I always make it diagram. Now I can throw a bunch of blocks in a directory and tell it to grab the components from the directory and build [THIS INTENT].app for my.

mrtesthah
0 replies
1d18h

I imagine you might do well to write your prompt with similar language (diction, sentence construction, etc.) to what you'd find in the output/answer you are attempting to evoke from the LLM. These are prediction machines, after all.

birracerveza
0 replies
1d4h

I actually found out that sometimes the less specific you are, the better the result. Sometimes even giving them "drunken" prompts with just a few keywords is enough, just something to let them understand where you're trying to get at and they fill out the rest. If you specify too much they tend to hyperfixate on things.

TIPSIO
0 replies
1d16h

Yep.

With Claude - “Figure out what is going on here/do you see why I’m trying to do…”

With OpenAI - Sometimes being crazy detailed is the only way I can get it to compete

JeremyNT
0 replies
1d5h

I mostly use Gemini and I basically treat it like a Google search most of the time. I find all the pronouns and sentence structure is usually completely unnecessary.

Every now and then Gemini decides I'm doing something unsafe and I need to get creative.

These complex system prompts are mostly there to somehow restrict what users can get the bots to do, and then users try even more complex responses to try to get around that.

gloosx
18 replies
1d12h

Claude is the most terrifyingly censored artificial word-spitting machine around. The "ai safety" word-filters they are curating are straight from the black mirror episode, where they blurred stuff right inside the eyeball.

lannisterstark
16 replies
1d12h

Really? A lot of the "wow so censored, look I broke it" people on reddit just want LLMs to say slurs.

Claude (and GPT-4o) works fine for an overwhelming majority of tasks.

gloosx
6 replies
1d11h

just conversated with this super intelligence:

what's 60000 + 65

I'd prefer not to discuss or encourage interpretations of numbers as crude or objectifying terms. Instead, I suggest we move our conversation in a more constructive direction. Is there a different topic you'd like to explore or discuss? I'm happy to engage in thoughtful conversation on a wide range of subjects.

Oh yeah. That's fine.

saaaaaam
2 replies
1d11h

Not sure how you’re getting this. I just ran it in Claude using Sonnet 3.5.

The response was simply: 60,065

gloosx
1 replies
1d11h

It's even funnier since it's random, but the fact there is just a human-curated character-sequence filter which includes hitler but doesn't include pol pot is a worrying thing really. It can randomly just go crazy-mode with simple numbers like 455, 80085, 60065 and probably others, only because their letter representations can imply slurs. It is only the tip of an iceberg of artificial mental problems modern artificial intelligence starts to inherit from it's instructors

saaaaaam
0 replies
23h4m

Hitler? Pol Pot? What?

puszczyk
1 replies
1d9h

Also, why would you ask this question to LLM? It's not a hammer, there are things is very useful for; adding numbers and math in general is not one of them.

gloosx
0 replies
1d8h

For me it is only useful as a rubber duck, I could not find real use for it except toying with it and conversating with myself. This question was asked to LLM just while exploring it's funny limits, and they don't sound funny anymore when I imagine someone using this data sucker seriously.

lannisterstark
0 replies
22h37m

I can type up a bunch of words on here too, doesn't mean anything.

so I'm going to call out a /r/thathappened here.

2-3-7-43-1807
3 replies
1d10h

the attempt to elicit slurs is just a way to channel and vent frustration about much more complex and practically relevant restrictions caused by censoring in llms. it's just the most simple and primitive illustration of a larger problem with ai.

Angostura
1 replies
1d5h

Unless you can set out some of these ‘complex and practically relevant restrictions’ that just sounds like a high-falluting attempt to justify trying to elicit slurs.

2-3-7-43-1807
0 replies
1d5h

huh? what is there even to justify about that? are you worried the llm gets traumatized?

gloosx
0 replies
1d8h

This is a beautiful explanation, it's tempting to add up another take here: these companies exploring AI safety are really just selling their anthropomorphised machines and making good money. The intelligence they sell is so intellectual it needs a word-blacklist in order to look safe for legislation.

slig
2 replies
1d8h

I tried Claude to remove "bad words" from a 10k "most popular English words" and it refused because some BS cultural excuse. Then I clarified that I wanted it to remove slurs and other words that might cause discomfort to some people and it still refused.

sumedh
1 replies
1d6h

That is probably a good thing, they dont want users to jailbreak it.

slig
0 replies
1d5h

That makes sense, thanks!

immibis
1 replies
1d9h

It was Google Gemini that said it couldn't help people under 18 with C++ because C++ is too dangerous and they could get hurt.

optimalsolver
0 replies
1d4h

Well it's true.

stuffoverflow
0 replies
1d8h

I guess you are talking about the web interface. I don't think it is a fair way to measure the usefulness of the models because they are clearly instructed to be overly "safe". I think ChatGPT currently has nearly 2000 tokens worth of trash polluting the system prompt where as claude.ai is way too obsessed about things like copyright.

The APIs of all of the models are more permissive and refusals to answer are much more rare.

lynx23
11 replies
1d12h

As a long-time OpenAI user, I almost registered for Claude yesterday. Got the email, clicked the link... And saw the mandatory phone number field. And turned around and closed the window.

slt2021
9 replies
1d12h

you are missing a more superior product (as far as coding related)

lynx23
8 replies
1d11h

If you're trying to say that you think Cluade does a better job coding, I've heard that already. And that was why I almost registered. But shady is shady is shady, so no, thanks. I can wait.

slt2021
5 replies
1d11h

no worries, scammers and hackers already got your phone number, address, and SSN from t-mobile, Verizon, and at&t customer data breaches, and from one of the customer data mining vendors like LexisNexis.

lynx23
4 replies
1d11h

Ahh, the "no need to be careful, the world is fucked anyways" argument. Whenever someone comes at me with this one, I seriously doubt their sincerity.

slt2021
3 replies
1d11h

yes, you are missing on a better product without even gaining any privacy benefit, if you consider phone number as sensitive.

you could registered with synthetic phone number (google phone) without compromising privacy, if you were that serious about privacy

lynx23
2 replies
1d11h

Anthropic could also NOT prompt for a phone number if they were so serious about getting their product established. But its much better to shame potential customers by claiming their privacy incentives are just wrong...

Kiro
1 replies
1d10h

Phone numbers are public data in my country. You can search my name and get my phone number instantly (as well as my SSN). What is your opinion on that?

maeil
0 replies
1d8h

Interesting, I've never heard of this before. Which country is this?

simonw
0 replies
1d4h

I’m pretty sure the phone number requirement is there to make it harder for people to create multiple accounts in order to take advantage of their free tier. It’s harder to obtain multiple phone numbers than it is to obtain multiple email addresses.

saaaaaam
0 replies
1d11h

Not trying to be provocative, but what’s shady about asking for a phone number?

okasaki
0 replies
1d9h

But openai also asks for a phone number?

slt2021
7 replies
1d12h

my biggest universal LLM prompt engineering hack is:

1. Ask Claude to come up with LLM prompt to solve problem, I add as many details and context as I can. I try to explain the problem in regular words, and dont care as much about structure and prompt engineering tricks. Just type whatever as I would type to a friend/colleague in Slack.

2. Create new chat and feed output of (1) and get desired, well-structured answer.

sometimes you just need to go meta

usrme
5 replies
1d11h

This sounds interesting! Would you be willing to share an example?

sumedh
2 replies
1d6h

Generate a system prompt for an AI model which will do ....

You put your requirements, take that prompt into a new chat window.

usrme
1 replies
1d4h

I was looking for a concrete example and not a reiteration of what was originally said.

sumedh
0 replies
19h37m

Did you try it, why didn't it work for you

slt2021
1 replies
1d2h

just the most recent one:

1. naiive prompt - gives dogshit answer: "how can I implement authorization for each microservice request if I am using AWS EKS and Linkerd as service mesh?"

- the answer to the first naiive prompt was mere 148 words. Similar to what you find in first results of gogel search.

2. meta-prompt - just start with "Write LLM prompt to ask AI about...". My meta prompt was "Write LLM prompt to ask AI about how can I implement authorization for each microservice request if I am using AWS EKS and Linkerd as service mesh"

- it gives the following prompt: "I'm using AWS EKS for container orchestration and Linkerd as my service mesh. How can I implement robust authorization for each microservice request in this architecture? Please provide an overview of recommended approaches, considering factors like scalability, security best practices, and integration with AWS services. Include any specific tools or patterns that work well in this ecosystem."

- the answer for the second prompt is much better at 428 words and I didn't have to think much. It took me 27 words of meta-prompt to get the 57 word real-prompt and the final answer is much better

usrme
0 replies
9h52m

Thank you so much for taking the time to give a concrete example! It really elucidated the process for me and I'll definitely do some A/B testing of my own to try it out.

adam_hn
0 replies
1d11h

In the anthropic console, there is a "Generate a prompt" section, which does what you are doing but better, I guess.

It takes normal text and generates a well-structured prompt.

simonw
5 replies
1d17h

Bits of this tutorial feel a little bit out-dated. The appendix on function calling for example - https://github.com/anthropics/courses/blob/master/prompt_eng... - shows how to implement function calling from scratch using XML-style tags - but Claude's API has had JSON schema-based function calling for a few months now: https://docs.anthropic.com/en/docs/build-with-claude/tool-us...

Oh, never mind - they have an entire separate course about tool use via the API here: https://github.com/anthropics/courses/blob/master/tool_use/0...

So they're using tools in that appendix purely to demonstrate how sophisticated you can get with raw prompting.

tcdent
1 replies
1d17h

I saw a talk recently from one of their prompt engineers where he encouraged XML-style tag use as well. Supported by the training datasets they use.

scottmf
0 replies
22h43m

Was this in person or do you have a link by any chance? Would love to see it.

From my own experience XML tags are incredibly powerful — particularly for very large system prompts, with some markdown in there too.

minimaxir
1 replies
1d17h

The documentation still advocates using XML tags as a subset of prompt engineering despite the push for JSON-based structured I/O, confusingly: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

The structured output tutorial uses the XML tags to wrap user input instead of using system prompt engineering and then passing the user input as-is (IMO system prompt engineering is still better since it adds more control levers): https://github.com/anthropics/courses/blob/master/tool_use/0...

thelittleone
0 replies
1d16h

Personally, I prefer XML over JSON for prompts from a readability perspective.

rapatel0
3 replies
1d17h

Anyone else find it funny that the entire concept of prompt engineering is probably very close to good management and communication principles?

It's kinda hilarious when you think about it.

My mental model for LLM chatbots is to treat them like a junior intern that has access to google. Sure they can get things right but realistically I have to check their work to prevent any show stopping issues.

wlonkly
0 replies
18h43m

I run our postmortem/incident review process at work, and there is one area on our incident report that is intended to capture any investigation threads that are still ongoing at the point the report was written, so as to note that there is more information to come but that the authors didn't think it necessary to hold up the whole process for that thread.

Getting people to not put action items or proposals in that section (i.e. propose investigation threads that are NOT in process yet) has been... challenging. But every time I change the description of that field on the report to try to better convey what it is for, I think about prompt engineering.

frankdenbow
0 replies
1d16h

exactly this. All of the well documented specs I wrote years ago are now fully coded apps after a copy and paste. What a time.

Terr_
0 replies
1d13h

Kinda, but I worry that actually overstates how practical/tractable the situation is. (As opposed to being just a very accessible form of endless statistical whack-a-mole.)

We're just trying to guide the the chaotic fever-dream of a text-predictor.

jdthedisciple
3 replies
1d10h

I heard Claude Sonnet was supposed to be "so much better" than GPT4o at coding.

Recently I had both of them port an entire scraping script from python to C#.

Surprisingly, Claude's version was borderline unusable whereas GPT4o's ran perfectly as expected right away (besides forgetting to tell me about a nuget package at the first shot)!

So yea, was a bit disappointed in Claude given the hype about it that's all. Would be curious about other people's experiences. I still personally find GPT4o to be the most intelligent model to date.

aga98mtl
1 replies
1d5h

Claude Sonnet was awesome at launch. It became much dumber over the last two weeks. There must be behind the scene throttling going on as it got more popular.

sumedh
0 replies
1d6h

Claude is great with Python and Php.

dr_dshiv
1 replies
1d19h

“This tutorial also exists on Google Sheets using Anthropic's Claude for Sheets extension. We recommend using that version as it is more user friendly”

Interesting UI choice!

minimaxir
0 replies
1d17h

I've used it and it's straightforward (inputs are system prompt, user messages, and other params), aside from potential caching/recalculation issues, e.g. you can't tell it "don't regenerate responses unless a source cell is changed", it appears to cache the responses for a certain amount of time.

wakasaka
0 replies
21h35m

I prompt with the minimum amount of text, with typos, just to find what is the least amount information and quality my prompt requires and most of the cases I get what I was looking for. I do the same IRL with my grammar, just enough to get the job done.

thenameless7741
0 replies
1d16h

i previously converted this tutorial to an interactive website when it was still in google sheets format and shared it on hn: https://news.ycombinator.com/item?id=40395057

i'm currently in the process of hard forking the repo and converting the remaining tutorials to typescript. just yesterday, i completed the conversion for the next part called "real world prompting", which you can find here: https://freya.academy/anthropic-rwpt-00

minimaxir
0 replies
1d17h

Notably the base tutorials use Claude Haiku, which is much, much harder to wrangle with prompt engineering than Claude 3.5 Sonnet.

The tool tutorials do use Claude 3.5 Sonnet, which makes more sense: Haiku has a very bad habit of flat-out ignoring the tool schema.

lerietaylor
0 replies
1d8h

It's hard to write prompts for data with no measure.

lerietaylor
0 replies
1d8h

It's hard to write prompts for data that has no measure.

lawlessone
0 replies
1d4h

I thought the point of these was that you don't need to be an engineer?

greenie_beans
0 replies
1d3h

i love being able to ask ai my poorly worded questions that would annoy coworkers. it's very helpful to work through the question, just asking the question alone is helpful. and now i have an ai machine who i can eek out a question that makes no sense, and it can still help me.

fkilaiwi
0 replies
1d19h

this seems pretty promising. I am planning to go through it. Any similar prompting courses I should checkout?

FrustratedMonky
0 replies
1d4h

Is prompt engineering really 'psychology'. Convincing the AI to do what you want. Just like you might 'prompt' a human to do something.

Like in the short story Lena, 2021-01-04 by qntm

https://qntm.org/mmacevedo

In short story, the weights of the LLM are a brain scan. But same situation. People could use multiple copies of the AI. But each time, they would have to 'talk it into' doing what they wanted.