TikTok – Succeeding with ML (and lots of cash)

TikTok* has caused political controversies, made Meta change its Instagram platform to mimic it, and caused many a moral panic. All signs of success.

TikTok’s use of machine learning to present a never-ending stream of engaging content is an example of the successful application of machine learning at a gargantuan scale. 

But, as the linked WSJ article shows, TikTok’s growth is driven by massive investments in technology and advertising. 

  • ByteDance, which owns TikTok, lost more than $7 billion from its operations in 2021 on $61.4b in revenues
  • The company spent $27.4b on user acquisition and $14.6b on R&D

I believe that the value of applied machine learning technologies will accrue to those companies that can deploy vast resources to acquire data (in TikTok’s case – users who generate the data) and build massive data and ML infrastructure. I am sure we will see similar revenue and spending trends if we analyze Meta and Google’s results.

While Data Science and Machine Learning careers grab the limelight, making ML platforms more efficient and processing data much cheaper will be more lucrative in the long term. 

If a company spends significant cash on ML and data infrastructure, it will always look for people to make things more efficient. Possible careers for the future:

  • Data Engineering
  • Data center operation and efficiency engineering
  • The broad “ML Operations” category

Natural Language Processing Made Easy with GPT-3

Natural Language Processing or NLP is a catch-all term for making sense of unstructured text-like data. Google search recommendations, chatbots, and grammar checkers are all forms of NLP.
This is a field with many years of research. But, for the last 5-7 years, machine learning has reigned supreme. 

Five years ago, machine learning approaches to NLP were labor intensive. Success meant having access to large amounts of clean and labeled training data that would train ML models. A text summarization model would be pretty different from one that did sentiment analysis. 

The development of large language models or LLMs has revolutionized this field. Models like GPT-3 are a general-purpose tools that can be used to do several different tasks with very little training.

To show GPT-3 in action, I built a tiny slack bot that asks some questions and uses GPT-3 to generate actions. The video below is a demo of the bot and also an explanation of how to prompt GPT-3 to do NLP tasks.

Machine Learning and its consequences

Machine Learning has brought huge benefits in many domains and generated hundreds of billions of dollars in revenue. However, the second-order consequences of machine learning-based approaches can lead to potentially devastating outcomes. 

This article by Kashmir Hill in the New York Times is exceptional reporting on a very sensitive topic – the identification of abusive material or CSAM. 

As the parent of two young children in the COVID age, I rely on telehealth services and friends who are medical professionals to help with anxiety-provoking (yet often trivial) medical situations. I often send photos of weird rashes or bug bites to determine if it is something to worry about.  

In the article, a parent took a photo of their child to send to a medical professional. This photo was uploaded to Google Photos, where it was flagged as being potentially abusive material by a machine learning algorithm. 

Google ended up suspending and permanently deleting his Gmail account and his Google Fi phone and flagging his account to law enforcement. 

Just imagine how you might deal with losing both your primary email account, your phone number, and your authenticator app. 

Finding and reporting abuse is critical. But, as the article illustrates, ML-based approaches often lack context. A photo shared with a medical professional may share similar features to those showing abuse. 

Before we start devolving more and more of our day-to-day lives and decisions to machine learning-based algorithms, we may want to consider the consequences of removing humans from the loop.