Generative Models and the “Grey Goo Problem”

Generative AI models may be causing a “Grey Goo” problem with art, publishing, and user-generated content. 

Thomas Jane encounters the Protomolecule in The Expanse

The Grey Goo Problem is a thought experiment where self-replicating nano-robots consume all available resources leading to a catastrophic scenario. This scenario is a popular science fiction trope (see comments).

Several publishers and user-generated content sites like StackOverflow have been impacted by a flood of AI-generated content in the last few months. Clarkesworld, a science fiction magazine, stopped accepting submissions last week. Even LinkedIn is overrun by ChatGPT-generated “thought leadership.” 

Tools like ChatGPT need high-quality training data to generate good results. They collect training data by scraping the Internet. You can see the issue here, can’t you? 

The Grey Goo scenario is managed through containment and quarantine in science fiction. For example, in The Expanse series (see image), containing the “Proto-Molecule” is a crucial plot element. 

The need to contain and quarantine Generative AI will result in more paywalls, subscriptions, and gated content. Crypto may even find its calling in guaranteeing the authenticity of online content. 

I fear that the Open Internet that made ChatGPT possible will be crippled by the actions of ChatGPT and its cousins.

Google, Microsoft and the Search Wars

A demo cost Google’s shareholders $100bn dollars last week. Why?

Google’s Share Price after the Bard event

Google has dominated search and online advertising for the last twenty years. And yet, it seems badly shaken by Microsoft’s moves to include a ChatGPT-like model in Bing search results. 

Why is this a threat to Google?

1️⃣ Advertising: Google’s revenues are driven by the advertisements it displays next to search results. The integration of language models allows users to get answers – removing the need to navigate to websites or view ads for a significant subset of queries.

2️⃣ Capital Expenditure: Search queries on Google cost around $0.01 (see link in the comments for some analysis). Integrating an LLM like ChatGPT *could* cost an additional 4/10th of a cent per query since the costs of training and inference are high. Even with optimization, integrating LLMs into Google search will increase costs in running search queries. According to some estimates, this directly impacts the bottom line to almost $40bn. 

3️⃣ Microsoft’s Position: Bing (and, more broadly, search) represents a small portion of Microsoft’s total revenues. Microsoft can afford to make search expensive and disrupt Google’s near-monopoly. Indeed Satya Nadella, in his interviews last week, said as much (see comments). 

4️⃣ Google’s Cautious AI Strategy: Google remains a pioneer in AI research. After all, the “T” in GPT stands for Transformer – a type of ML model created at Google! Google’s strategy has to sprinkle AI in products such as Assistant, Gmail, Google Docs, etc. While they probably have sophisticated LLMs (see LaMDA, for example) on hand, Google seems to have held off releasing an AI-first product to avoid disrupting their search monopoly. 

5️⃣ Curse of the demo: Google’s AI presentation seemed rushed and a clear reaction to Microsoft’s moves. LLMs are known to generate inaccurate results, but they didn’t catch a seemingly obvious error made by their BARD LLM in a recorded video. This further reinforced the market sentiment that Google seems to have lost its way.

References and Further Reading

Explaining Reinforcement Learning with Human Feedback with Star Trek

Microsoft announced today that it will include results from a Large Language Model based on GPT-3 in Bing results. They will also release a new version of the Edge browser that will include a ChatGPT-like bot. 

GPT-3 has been around for almost two years. What has caused this sudden leap forward in the capabilities of Large Language Models 🤔?

The answer is – *Reinforcement Learning From Human Feedback* or RLHF. 

By combining the capabilities of a large language model with those of another model trained on the end-users preferences, we end up with the uncannily accurate results that ChatGPT seems to produce.

Ok – but how does RLHF work? Let me try and explain with a (ridiculous) analogy. 

In the Star Trek series, the Replicator is a device that can produce pretty much anything on demand. 

When Captain Picard says, “Tea, Earl Grey, Hot!” it produces the perfect cup of tea. But how might you train a Replicator? With RLHF, of course!

Explaining RLHF

Let’s see how:

1. Feed the Replicator with all the beverage recipes in the known universe.

2. Train it to try and predict what a recipe would be when given a prompt. I.e. when a user says “Tea, Earl Gray, Hot!” – it should be able to predict what goes into the beverage.

3. Train *another* model – let’s call it the “Tea Master 2000” with Captain Picard’s preferences. 

4. When the Replicator generates a beverage, the Tea Master responds with a score. +10 for a perfect cup of tea, -10 for mediocre swill. 

5. We now use Reinforcement Learning (RL) to optimize the Replicator to get a perfect ten score. 

6. After much optimization, the Replicator can generate the perfect cup of tea – tuned to Captain Picard’s preferences.

If you substitute the Replicator with an LLM like GPT-3, and substitute the Tea Master with another ML model called the *Preference* model, then you have seen RLHF in action! 

It is a lot more complicated, but I will take any opportunity to generate Star Trek TNG-themed content 🖖.

Further Reading

Hugging Face has a fantastic blog post explaining RLHF in detail: https://huggingface.co/blog/rlhf

For those more visually inclined, Hugging Face also has a YouTube video about RLHF: https://www.youtube.com/live/2MBJOuVq380?feature=share

Anthropic AI has a paper that goes into a lot of detail on how they use RLHF to train their AI Assistant: https://arxiv.org/abs/2204.05862

Ben Thomson’s “4 Horsemen of the Tech Recession”

In the last month, we have had huge layoffs across technology, yet the “real economy” seems robust. What is going on?

Meta is making 2023 ‘a year of efficiency’. Microsoft, Alphabet, and many other companies have stated economic headwinds as the reason for letting thousands of people go. 

However, last week, the US posted the lowest unemployment numbers in 50 years(!) while adding half a million jobs. 

Ben Thomson discusses this in this week’s excellent Stratechery article

He points to 4 factors that are causing this disconnect:

1️⃣ 😷 The COVID Hangover -> Companies assumed COVID meant a permanent acceleration of eCommerce spending. Customer behavior has reverted (to a certain extent) to pre-pandemic patterns

2️⃣ 💻 The Hardware Cycle -> Hardware spending is cyclical. After bringing forward spending due to the pandemic, customers are unlikely to buy new hardware for a while.

3️⃣ 📈 Rising interest rates -> The era of free money is over. Investing in loss-making technology companies in anticipation of a future payout is no longer attractive.

4️⃣ 🛑 Apple’s Application Tracking Transparency (ATT) -> ATT has made it difficult to track the effectiveness of advertising spending. This caused enormous problems for companies like Meta, Snap, etc. that rely on advertising.

Book Review: “Artificial Intelligence – A Guide for Thinking Humans” by Melanie Mitchell

Artificial Intelligence – A Guide For Thinking Humans

Introduction

Melanie Mitchell’s book “Artificial Intelligence – A Guide for Thinking Humans” is a primer on AI, its history, its applications, and where the author sees it going. 

Ms. Mitchell is a scientist and AI researcher who takes a refreshingly skeptical view of the capabilities of today’s machine learning systems. “Artificial Intelligence” has a few technical sections but is written for a general audience. I recommend it for those looking to put the recent advances in AI in the context of the field’s history.

Key Points

“Artificial Intelligence” takes us on a tour of AI – from the mid-20th century, when AI research started in earnest, to the present day. She explains, in straightforward prose, how the different approaches to AI work, including Deep Learning and Machine Learning, based approaches to Natural Language Processing. 

Much of the book covers how modern ML-based approaches to image recognition and natural language processing work “under the hood.” The chapters on AlphaZero and the approaches to game-playing AI are also well-written. I enjoyed these more technical sections, but they could be skimmed for those desiring a broad overview of these systems. 

This book puts advances in neural networks and Deep Learning in the context of historical approaches to AI. The author argues that while machine learning systems are progressing rapidly, their success is still limited to narrow domains. Moreover, AI systems lack common sense and can be easily fooled by adversarial examples. 

Ms. Mitchell’s thesis is that despite advances in machine learning algorithms, the availability of huge amounts of data, and ever-increasing computing power, we remain quite far away from “general purpose Artificial Intelligence.” 

She explains the role that metaphor, analogy, and abstraction play in helping us make sense of the world and how what seems trivial can be impossible for AI models to figure out. She also describes the importance of us learning by observing and being present in the environment. While AI can be trained via games and simulation, their lack of embodiment may be a significant hurdle towards building a general-purpose intelligence.

The book explores the ethical and societal implications of AI and its impact on the workforce and economy.

What Is Missing?

“Artificial Intelligence” was published in 2019 – a couple of years before the explosion in interest in Deep Learning triggered due to ChatGPT and other Large Language Models (LLMs). So, this book does not cover the Transformer models and Attention mechanisms that make LLMs so effective. However, these models also suffer from the same brittleness and sensitivity to adversarial training data that Ms. Mitchell describes in her book. 

Ms. Mitchell has written a recent paper covering large language models and can be viewed as an extension of “Artificial Intelligence.”

Conclusion

AI will significantly impact my career and those of my peers. Software Engineering, Product Management, and People Management are all “Knowledge Work.” And this field will see significant disruption as ML and AI-based approaches start showing up. 

It is easy to get carried away with the hype and excitement. Ms. Mitchell, in her book, proves to be a friendly and rational guide to this massive field. While this book may not cover the most recent advances in the field, it still is a great introduction and primer to Artificial Intelligence. Some parts of the book will make you work, but I still strongly recommend it to those looking for a broader understanding of the field.

Big Tech’s Layoffs, AI, and the Closing of the Productivity Gap

Big Tech has let go of thousands of workers in the last couple of months. In addition to the end of the era of cheap money and a broader economic slowdown, this story may have another angle.

This is the impact of AI and the possible closing of the “Productivity Gap.” 

The Productivity Gap is a phenomenon where workers’ output, especially in developing countries, has been growing slower than expected. The shift to cloud computing and SaaS business models in the mid-2010s led to an explosion in both the valuations of technology companies and increases in the productivity of individual engineers and teams. A small startup could spin up and scale a business faster than ever. 

Fast forward to the mid-2020s, and suddenly cloud computing is a commodity. Innovative Frameworks from the last decade, like React, Spring, and others, are bloated and complex. 

For the last few years, companies like Meta, Alphabet, and Microsoft could hedge their bets and grow their teams because they were less likely to become disrupted by a small startup. Hoarding talent and doing “acqui-hires” was a feasible strategy.

Explaining the Tech Layoffs

Now there is once more a disruptive technology on the horizon. Generative AI Models are making giant leaps – a small team of ML-native programmers could build something that could blow incumbent services out of the water. 

Alphabet’s panic over OpenAI’s ChatGPT is a case in point. Suddenly it doesn’t make sense to hoard talent to work on a platform that is about to be irrelevant. 

AI-enabled software and infrastructure could close the productivity gap and fuel the rise of disruptive startups. 

The incumbents are then cutting costs and preparing themselves for the next round of disruption by making significant investments in AI. 

It no longer makes sense to hoard programmers when the entire industry could undergo a paradigm shift similar to that brought about by Cloud Computing 15 years ago.

The brutal layoffs we have seen in the last three months could be the result.

The Limits of Generative AI

AI is having a moment. The emergence of Generative AI models showcased by ChatGPT, DALL-E, and others has caused much excitement and angst. 

Will the children on ChatGPT take our jobs? 

Will code generation tools like Github Copilot built on top of Large Language Models make software engineers as redundant as Telegraph Operators? 

As we navigate this brave new world of AI, prompt engineering, and breathless hype, it is worth looking at these AI models’ capabilities and how they function. 

Models like the ones ChatGPT uses are trained on massive amounts of data to act as prediction machines. 

I.e., they can predict that “Apple” is more likely than “Astronaut” to occur in a sentence starting with: “I ate an.. “.

The only thing these models know is what is in their training data. 

For example, GitHub Copilot will generate better Python or Java code than Haskell. 

Why? Because there is way less open-source code available in Haskell than in Python. 

If you ask ChatGPT to create the plot of a science fiction film involving AI, it defaults to the most predictable template. 

“Rogue AI is bent on world domination until a group of plucky misfit scientists and tough soldiers stops it.” 

Not quite HAL9000 or Marvin the Paranoid Android. 

Why? Because this is the most common science fiction film plot.

Cats and Hats

Generative AI may generate infinite variations of a cat wearing a hat, but it has yet to be Dr. Suess. 

AI is not going to make knowledge work obsolete. But, the focus will shift from Knowledge to Creativity and Problem-Solving. 

Stability.AI – Democratizing Access to Machine Learning

Stability.AI, a UK-based startup famous (or notorious?) for releasing the Stable Diffusion image generation model, just raised $100m on a $1bn valuation

Their goal is to “Democratize AI.” They have done so by open-sourcing the Stable Diffusion text-to-image model and are working on releasing other models, including large language models. 

This approach is in stark contrast to the one taken by OpenAI, Facebook, Google, etc. These companies have gated access to ML models like GPT-3 via APIs or invite-only programs. The reasoning is that these models could be used to generate hateful text and images and are generally too dangerous to be released to the ignorant masses.

In a recent interview, Emad Mostaque, the CEO of Stability.Ai and a fascinating thinker, talks about the inevitability of generative and large language models leaking out to the wild. He wants to focus on giving people a framework for the ethical use of AI while giving them the tools to build and train models for their specific uses. 

Stability.Ai has struck a deal with Eros Interactive to get access to their massive library of Indian content. Can you imagine what could be trained using that data?

Congratulations to Stability.Ai. I am curious about what this more open (or perhaps reckless?) approach to ML will bring us.

Generated image of a Robot having a celebratory drink.
Image generated by Stable Diffusion – Prompt: “A happy robot drinking champagne at a cocktail party at night, oil painting, muted, candid, high resolution, trending on artstation”

AlphaTensor – Speeding up number crunching with Machine Learning

For some, matrix multiplication may trigger memories of tedious high school algebra exercises. Last week, this humble mathematical operation was also the topic of a significant breakthrough in machine learning. 

Art generated by Stable Diffusion

Background – Matrix Multiplication

Matrix multiplication is the foundation on which many core computational operations are built. Graphic processing, machine learning, computer gaming, etc. – all rely on matrix multiplication. At any given point in time, there are millions of computers doing (probably) billions of matrix multiplication operations. 
Making this humble operation faster would result in significant computational and efficiency gains.

Why do we want faster matrix multiplication?

Multiplying two matrices involves doing a large number of multiplication and addition operations. 
For example, multiplying a 4X5 and a 5X5 matrix involves 100 multiplication operations using the traditional matrix multiplication method that has been around since the early nineteenth century. 
In 1969, a mathematician, Volker Strassen, came up with an ingenious method that reduced the number of operations required by about 10%. This was hailed as a groundbreaking discovery in the world of mathematics.

DeepMind Enters the Arena

This brings us to DeepMind’s paper last week, where they used the AlphaTensor deep learning model to discover a new algorithm for matrix multiplication that is faster by about 10 – 20% than the Strassen method. 
This is a *colossal deal*!
We are seeing a machine learning model find new algorithms to solve material, real-world problems. We have already seen DeepMind make groundbreaking discoveries in computational biology with AlphaFold. We now see applications of its Deep Learning models (based on playing games) to foundational aspects of the modern world. 
Exciting times are ahead!

TikTok – Succeeding with ML (and lots of cash)

TikTok* has caused political controversies, made Meta change its Instagram platform to mimic it, and caused many a moral panic. All signs of success.

TikTok’s use of machine learning to present a never-ending stream of engaging content is an example of the successful application of machine learning at a gargantuan scale. 

But, as the linked WSJ article shows, TikTok’s growth is driven by massive investments in technology and advertising. 

  • ByteDance, which owns TikTok, lost more than $7 billion from its operations in 2021 on $61.4b in revenues
  • The company spent $27.4b on user acquisition and $14.6b on R&D

I believe that the value of applied machine learning technologies will accrue to those companies that can deploy vast resources to acquire data (in TikTok’s case – users who generate the data) and build massive data and ML infrastructure. I am sure we will see similar revenue and spending trends if we analyze Meta and Google’s results.

While Data Science and Machine Learning careers grab the limelight, making ML platforms more efficient and processing data much cheaper will be more lucrative in the long term. 

If a company spends significant cash on ML and data infrastructure, it will always look for people to make things more efficient. Possible careers for the future:

  • Data Engineering
  • Data center operation and efficiency engineering
  • The broad “ML Operations” category