Negotiating with a strange peer ..

An under-appreciated facet of LLMs is just how *weird* they are.

Claude, ChatGPT, and pretty much every other application built on top of an LLM have a system prompt. This is a set of instructions that drives the application’s behavior. The good folks at Anthropic recently released the system prompts used for the Claude application (see link below).

Anyone building applications on top of LLMs should examine Claude’s system prompts to understand how “prompt engineering” is done in production.

Take this example:

“Claude provides thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, it tries to give the most correct and concise answer it can to the user’s message.”

This is how “programming” in an LLM-powered world works. As a recovering Java programmer, this blows my mind 🤯.

Here is the thing—we are going to see wild new software experiences built on top of LLMs in the coming years.

But this will only happen once software engineers shed decades of iterative or declarative approaches to “programming” and learn how to work with LLMs.

But this will only happen once software engineers shed decades of iterative or declarative approaches to “programming” and learn how to work with LLMs.

A paradigm shift will be required to move us beyond the idea that LLMs are just another fancy API that we can integrate into existing applications.

We call working with LLMs “prompt engineering,” but there isn’t much engineering here. This art or skill should probably be called “LLM Whispering” or “LLM Negotiation.” Because what we will be doing isn’t engineering so much as negotiating or working with a very strange peer.

Melanie Mitchell on the Turing Test

From “The Turing test and our shifting conceptions of intelligence” by Melanie Mitchell.

In her insightful piece, “The Turing Test and our shifting conceptions of intelligence,” Melanie Mitchell challenges the traditional view of the Turing Test as a valid measure of intelligence. She argues that while the test may indicate a machine’s ability to mimic human conversation, it fails to assess deeper cognitive abilities, as demonstrated by the limitations of large language models (LLMs) in reasoning tasks. This prompts us to reconsider what it truly means for a machine to think, moving beyond mere mimicry to a more nuanced understanding of intelligence.

Our understanding of intelligence may be shifting beyond what Turing initially imagined.

From the article:

On why Turing initially proposed the Turing Test

Turing’s point was that if a computer seems indistinguishable from a human (aside from its appearance and other physical characteristics), why shouldn’t we consider it to be a thinking entity? Why should we restrict “thinking” status only to humans (or more generally, entities made of biological cells)? As the computer scientist Scott Aaronson described it, Turing’s proposal is “a plea against meat chauvinism.”

A common criticism of the Turing Test as a measure of AI capability

Because its focus is on fooling humans rather than on more directly testing intelligence, many AI researchers have long dismissed the Turing Test as a distraction, a test “not for AI to pass, but for humans to fail.”