Dispatches from Mediocristan

LLMs are powerful tools – but credulous users risk being stuck in a dangerous place: Mediocristan, the land of the average.

Mediocristan appears in Nassim Nicholas Taleb’s Incerto series. It’s a domain where outcomes are predictable, smooth, and derived from averaging all inputs.

Sound familiar?

LLMs predict the most likely next token based on massive training data (yes, yes – I know about RLHF, etc.). They are statistical engines of mediocrity by design.

And like it or not, LLM use pushes us deeper into Mediocristan daily.

A recent viral piece in NY Magazine exposed how university students rely utterly on ChatGPT. But it’s hardly limited to academia—I’ve encountered memos, emails, and pitch decks that bear the unmistakable hallmarks of AI slop.

We’re outsourcing our thinking to Mediocristan with great enthusiasm.

On the other side lies Extremistan—the domain of consequential outliers where one event’s probability is uncorrelated with another. Mathematically, it’s the fat tails of distributions where Black Swans lurk.

Extremistan is where interesting and unexpected things happen—where growth and destruction co-exist. The very release of ChatGPT in 2022 was itself an event straight from Extremistan!

I’m as enthusiastic an LLM user as any, but comparing my writing from 2020 to today, I’m clearly on the express train to Mediocristan.

This realization is jarring. So what now?
Should we embrace the slop and relocate to Mediocristan?
Angrily denounce AI and revert to writing screeds on clay tablets?

The critical skill for navigating our new knowledge economy will be deciding where and how to use AI.

Meanwhile, Mediocristan steadily expands, assimilating new domains and making them ripe for disruption from—you guessed it—Extremistan.

The AI Leadership Paradox: When Slowing Down Becomes a Competitive Advantage

AI tools are supercharging individual productivity—but are they also undermining team cohesion?

As a technology executive straddling engineering leadership and client advisory roles, I’ve been an early and enthusiastic adopter of generative AI. Tools like Claude and ChatGPT have transformed my workflow. I can go from idea to prototype in hours, not days. Strategy memos, design documents, and new product concepts come together faster than ever before.

This feels like progress—and in many ways, it is. But there’s a growing paradox I can’t ignore: the more productive I become with AI, the more I risk overwhelming the very teams I lead.


From Brainstorm to Broadcast

I’m all about writing things down. Multi-page emails, long JIRA comments, multi-message Slack threads -> I am THAT guy. This was already a challenge. Now, with generative AI in the mix, it’s even easier for me to take ideas and turn them into fully fledged messages or documents.

It feels productive. But I know that every new AI-assisted memo I send can also create confusion—or even dread—on the receiving end. It’s not just messages, it’s also code, designs, presentations, etc.

What used to be a collaborative back-and-forth now feels like a broadcast. Instead of whiteboarding ideas together, I’m unintentionally showing up with something that already feels “decided.” Even when it’s not.


Fermenting Context Collapse

Teams don’t just need to know what to do—they need to understand why. That context often emerges organically: a passing comment, a shared concern raised in a meeting, a collective moment of clarity. But when AI tools let leaders bypass that messy, human process and jump straight to the output, something critical gets lost.

We’re seeing a form of context collapse: the shift from shared understanding to unilateral information delivery. It might be efficient, but it chips away at clarity, trust, and momentum.


Losing the Plot (Together)

Teams don’t just execute plans—they co-create the narrative that gives those plans meaning. That narrative helps people understand how their work fits into a bigger picture, and why it matters. This helps reduce confusion and leads to clear execution.

When leaders lean too heavily on AI to shortcut the narrative-building process, teams are left with tasks but no story. This can be especially damaging in cross-cultural or distributed environments, where communication already carries more friction. The result? Misalignment, low engagement, and missed opportunities for innovation.


The Risk to Innovation and Ownership

Harvard Business School’s Amy Edmondson talks about psychological safety as the bedrock of high-performing teams.

When people feel like decisions are made without them—or worse, that their input doesn’t matter—they stop contributing. They play it safe. They wait to be told what to do.

AI acceleration makes it dangerously easy for leaders to skip past the slow, participatory parts of leadership. But those are the very moments that create buy-in, spark creativity, and foster innovation.


Developing Restraint

Here’s the paradox: to lead effectively in an AI-accelerated world, we may need to slow down.

What I’ve come to see as an essential leadership skill is what I call AI restraint—knowing when not to use the tools at your disposal.

That means:

  • Creating space for co-creation: Holding regular “no-AI” brainstorms where ideas emerge collaboratively
  • Thinking out loud: Sharing early thoughts, not just polished AI-assisted conclusions
  • Rebuilding narrative: Giving teams time to shape the story around the work—not just deliver on tasks
  • Signal your intent: When sharing early ideas, explicitly say you’re thinking out loud. Make it clear that these aren’t directives—they’re starting points. This invites dialogue instead of quiet compliance.

Winning Together By Slowing Down

It is easy to generate what looks like a polished strategy doc in five minutes. But in a world already overrun with AI slop, the real differentiator isn’t speed. It’s discernment.

It’s learning how to balance velocity with clarity, and productivity with participation.

The future of leadership isn’t about issuing more brilliant ideas.

It’s about knowing which ideas matter, and creating the space for teams to make them real – together.

It turns out that in this exponential age, judgment, self-discipline, and the wisdom to slow down may be our most valuable leadership capabilities.

On DeepSeek

Is it really doomsday for U.S. AI companies? The harbinger of the apocalypse appears to be a blue whale.

Nvidia’s stock is down 12.5%. There’s a broad tech sell-off, and Big Tech seems a little uneasy.

The reason? A Chinese hedge fund built and trained a state-of-the-art LLM to give their spare GPUs something to do.

DeepSeek’s R1 model reportedly performs on par with OpenAI’s cutting-edge o1 models. The twist? They claim to have trained it for a fraction of the cost of models like GPT-4 or Claude Sonnet—and did so using GPUs that are 3-4 years old. To top it off, the DeepSeek API is priced significantly lower than the OpenAI API.

Why did this trigger a sell-off of Nvidia (NVDA)?

  • It shows that building cutting-edge models doesn’t require tens of thousands of the latest Nvidia GPUs anymore.
  • DeepSeek’s models run at a fraction of the cost of large LLMs, which could shift demand away from Nvidia’s high-end hardware.

For U.S. companies, this is a wake-up call. The Biden-era export restrictions didn’t have the intended impact. But for anyone building on AI, there’s a silver lining:

  • Building LLMs and reasoning models is no longer limited to companies throwing billions at compute.
  • This will likely kick off an arms race as U.S. companies race to optimize costs and stay competitive with DeepSeek.
  • Data sovereignty will still matter—most companies won’t want their data processed by a Chinese-hosted model. If DeepSeek’s approach proves viable, expect U.S. providers to replicate it.

Categories AI Tags

Negotiating with a strange peer ..

An under-appreciated facet of LLMs is just how *weird* they are.

Claude, ChatGPT, and pretty much every other application built on top of an LLM have a system prompt. This is a set of instructions that drives the application’s behavior. The good folks at Anthropic recently released the system prompts used for the Claude application (see link below).

Anyone building applications on top of LLMs should examine Claude’s system prompts to understand how “prompt engineering” is done in production.

Take this example:

“Claude provides thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, it tries to give the most correct and concise answer it can to the user’s message.”

This is how “programming” in an LLM-powered world works. As a recovering Java programmer, this blows my mind 🤯.

Here is the thing—we are going to see wild new software experiences built on top of LLMs in the coming years.

But this will only happen once software engineers shed decades of iterative or declarative approaches to “programming” and learn how to work with LLMs.

A paradigm shift will be required to move us beyond the idea that LLMs are just another fancy API that we can integrate into existing applications.

We call working with LLMs “prompt engineering,” but there isn’t much engineering here. This art or skill should probably be called “LLM Whispering” or “LLM Negotiation.” Because what we will be doing isn’t engineering so much as negotiating or working with a very strange peer.

Melanie Mitchell on the Turing Test

From “The Turing test and our shifting conceptions of intelligence” by Melanie Mitchell.

In her insightful piece, “The Turing Test and our shifting conceptions of intelligence,” Melanie Mitchell challenges the traditional view of the Turing Test as a valid measure of intelligence. She argues that while the test may indicate a machine’s ability to mimic human conversation, it fails to assess deeper cognitive abilities, as demonstrated by the limitations of large language models (LLMs) in reasoning tasks. This prompts us to reconsider what it truly means for a machine to think, moving beyond mere mimicry to a more nuanced understanding of intelligence.

Our understanding of intelligence may be shifting beyond what Turing initially imagined.

From the article:

On why Turing initially proposed the Turing Test

Turing’s point was that if a computer seems indistinguishable from a human (aside from its appearance and other physical characteristics), why shouldn’t we consider it to be a thinking entity? Why should we restrict “thinking” status only to humans (or more generally, entities made of biological cells)? As the computer scientist Scott Aaronson described it, Turing’s proposal is “a plea against meat chauvinism.”

A common criticism of the Turing Test as a measure of AI capability

Because its focus is on fooling humans rather than on more directly testing intelligence, many AI researchers have long dismissed the Turing Test as a distraction, a test “not for AI to pass, but for humans to fail.”

Smallville, Agent Based Modeling, and Capital Markets

Google and Stanford cooked up something intriguing—a virtual village called Smallville, populated by agents running on the #ChatGPT API.

The researchers witnessed interesting emergent behavior, from coordination and communication to downright adorable interactions among the village’s wholesome residents.

Smallville even comes with cute graphics. But beyond the little sprites organizing Valentine’s Parties (yes, that’s what happens in Smallville): this experiment made me think of my time, a long time ago and in a City far away, in Capital Markets.

Smallville (courtesy Ars Technica)

Sidebar

Derivatives are a vast market. And derivatives, like options, are priced using a somewhat arcane mathematical field called Stochastic Calculus – the Black-Scholes equation being a famous example.

The underlying assumption is that markets behave randomly, and Stochastic Calculus provides a way of modeling this behavior. But – this approach can have problems. Even the famous creators of the Black-Scholes equation spectacularly blew up their fund LTCM.


Enter Agent Based Modelling (ABM): a nifty but niche approach that relies on simulating the behavior of market participants via Agents. The idea is that these simulations provide a better insight into how the market may evolve under different conditions.

Smallville shows us that LLM-driven agents are a possibility. Is it a stretch to envision specialized LLMs, trained on financial data, being used in ABM to predict how a particularly temperamental market might behave?

If you are a quantitative analyst on a sell-side firm looking to market-make a particularly exotic derivative, an LLM-powered approach may be viable. Or at least less boring than reaching for the Stochastic Calculus textbook.

The future might find traders armed with their own simulated worlds to forecast the price of, oh, let’s say, a derivative on the price of an exotic tulip of a non-fungible JPEG of a smoking Ape.. who knows?

PS – The painting is called “The Copenhagen Stock Exchange” by P.S. Krøyer. You can see why an agent-based approach to simulating capital markets is a .. possibility..

The Future is Here..

It’s just not very evenly distributed ..

This thought-provoking quote by William Gibson has been on my mind recently. The frantic pace of AI development contrasts sharply with the casual indifference of friends and family who do not care about cutting-edge technology.

Most people outside the tech community may have heard about ChatGPT, LLMs, or other “autonomous” technology in passing.

However, we will increasingly see these worlds intersect. Take, for example, this amusing video of a San Francisco police officer attempting to reason with a wayward Waymo car.

The cop steps before the slow-moving vehicle, commanding it to stop and stay like an errant puppy. He then lights a flare in front of the car, hoping the smoke would make it stop.

The video is funny but is also a cautionary tale of the types of issues that we will face when introducing autonomous agents to the broader public.

Just like the bewildered cop, we will have to deal with users who do not understand the capabilities and limitations of new technology.

Designing effective User Interfaces and Experiences for these complex new technologies will be critical to broad and safe adoption.

Generative Models and the “Grey Goo Problem”

Generative AI models may be causing a “Grey Goo” problem with art, publishing, and user-generated content. 

Thomas Jane encounters the Protomolecule in The Expanse

The Grey Goo Problem is a thought experiment where self-replicating nano-robots consume all available resources leading to a catastrophic scenario. This scenario is a popular science fiction trope (see comments).

Several publishers and user-generated content sites like StackOverflow have been impacted by a flood of AI-generated content in the last few months. Clarkesworld, a science fiction magazine, stopped accepting submissions last week. Even LinkedIn is overrun by ChatGPT-generated “thought leadership.” 

Tools like ChatGPT need high-quality training data to generate good results. They collect training data by scraping the Internet. You can see the issue here, can’t you? 

The Grey Goo scenario is managed through containment and quarantine in science fiction. For example, in The Expanse series (see image), containing the “Proto-Molecule” is a crucial plot element. 

The need to contain and quarantine Generative AI will result in more paywalls, subscriptions, and gated content. Crypto may even find its calling in guaranteeing the authenticity of online content. 

I fear that the Open Internet that made ChatGPT possible will be crippled by the actions of ChatGPT and its cousins.

Google, Microsoft and the Search Wars

A demo cost Google’s shareholders $100bn dollars last week. Why?

Google’s Share Price after the Bard event

Google has dominated search and online advertising for the last twenty years. And yet, it seems badly shaken by Microsoft’s moves to include a ChatGPT-like model in Bing search results. 

Why is this a threat to Google?

1️⃣ Advertising: Google’s revenues are driven by the advertisements it displays next to search results. The integration of language models allows users to get answers – removing the need to navigate to websites or view ads for a significant subset of queries.

2️⃣ Capital Expenditure: Search queries on Google cost around $0.01 (see link in the comments for some analysis). Integrating an LLM like ChatGPT *could* cost an additional 4/10th of a cent per query since the costs of training and inference are high. Even with optimization, integrating LLMs into Google search will increase costs in running search queries. According to some estimates, this directly impacts the bottom line to almost $40bn. 

3️⃣ Microsoft’s Position: Bing (and, more broadly, search) represents a small portion of Microsoft’s total revenues. Microsoft can afford to make search expensive and disrupt Google’s near-monopoly. Indeed Satya Nadella, in his interviews last week, said as much (see comments). 

4️⃣ Google’s Cautious AI Strategy: Google remains a pioneer in AI research. After all, the “T” in GPT stands for Transformer – a type of ML model created at Google! Google’s strategy has to sprinkle AI in products such as Assistant, Gmail, Google Docs, etc. While they probably have sophisticated LLMs (see LaMDA, for example) on hand, Google seems to have held off releasing an AI-first product to avoid disrupting their search monopoly. 

5️⃣ Curse of the demo: Google’s AI presentation seemed rushed and a clear reaction to Microsoft’s moves. LLMs are known to generate inaccurate results, but they didn’t catch a seemingly obvious error made by their BARD LLM in a recorded video. This further reinforced the market sentiment that Google seems to have lost its way.

References and Further Reading

Explaining Reinforcement Learning with Human Feedback with Star Trek

Microsoft announced today that it will include results from a Large Language Model based on GPT-3 in Bing results. They will also release a new version of the Edge browser that will include a ChatGPT-like bot. 

GPT-3 has been around for almost two years. What has caused this sudden leap forward in the capabilities of Large Language Models 🤔?

The answer is – *Reinforcement Learning From Human Feedback* or RLHF. 

By combining the capabilities of a large language model with those of another model trained on the end-users preferences, we end up with the uncannily accurate results that ChatGPT seems to produce.

Ok – but how does RLHF work? Let me try and explain with a (ridiculous) analogy. 

In the Star Trek series, the Replicator is a device that can produce pretty much anything on demand. 

When Captain Picard says, “Tea, Earl Grey, Hot!” it produces the perfect cup of tea. But how might you train a Replicator? With RLHF, of course!

Explaining RLHF

Let’s see how:

1. Feed the Replicator with all the beverage recipes in the known universe.

2. Train it to try and predict what a recipe would be when given a prompt. I.e. when a user says “Tea, Earl Gray, Hot!” – it should be able to predict what goes into the beverage.

3. Train *another* model – let’s call it the “Tea Master 2000” with Captain Picard’s preferences. 

4. When the Replicator generates a beverage, the Tea Master responds with a score. +10 for a perfect cup of tea, -10 for mediocre swill. 

5. We now use Reinforcement Learning (RL) to optimize the Replicator to get a perfect ten score. 

6. After much optimization, the Replicator can generate the perfect cup of tea – tuned to Captain Picard’s preferences.

If you substitute the Replicator with an LLM like GPT-3, and substitute the Tea Master with another ML model called the *Preference* model, then you have seen RLHF in action! 

It is a lot more complicated, but I will take any opportunity to generate Star Trek TNG-themed content 🖖.

Further Reading

Hugging Face has a fantastic blog post explaining RLHF in detail: https://huggingface.co/blog/rlhf

For those more visually inclined, Hugging Face also has a YouTube video about RLHF: https://www.youtube.com/live/2MBJOuVq380?feature=share

Anthropic AI has a paper that goes into a lot of detail on how they use RLHF to train their AI Assistant: https://arxiv.org/abs/2204.05862