draw a(i)nything 🤖
AI landscape 🗺
We’ve looked at artificial intelligence (AI) through many lenses on UC, from its use in the healthcare space in vol. 4 & 20, to TikTok’s dominant AI algorithms in vol. 43 & 53 - to name a few. The recent unicorn status of Stability AI and Jasper at the seed and series A stage have suggested that the sector is pushing against the grain in the current macroeconomic climate, begging for a deeper look.
As seen in the State of AI Report 2022, the global economic downturn has affected the startup industry as a whole with an expected 24% decrease in YoY funding. This is seen in the AI space as well, with private AI companies expected to raise 36% less money in 2022 vs last year but still more than in 2020.
As shown above, AI startup funding is expected to hit $71b this year. The space has been growing rapidly - Sequoia recently broke the landscape down into a few key segments that sparked my interest, primarily:
🔠 Text & Code 💻
Think AI generated blog posts, product descriptions, thesis papers, excel, and a whole lot more basically all written for (with) you.
These are created off the back of large language models like OpenAI’s GPT-3, a 175 billion-parameter model that can generate text and even code given a short prompt containing instructions.
E.g., Copy.ai helps with any kind of writing, so does Jasper - the Series A unicorn that just raised $125m, and Debuild helps you build your website with little to no code.
📸 Image & Video 📽
Think text-to-image or video, with AI generating images that match your description, editing images by changing the style or adding elements (be it a new car or girlfriend) to the image all with a text prompt.
Let’s take it further, think about asking the AI to ‘complete’ a partial image based on what its database of over 650 million images would consider the best match, as Trevor Noah did with the CTO of Open AI a fortnight ago.
E.g., Meta’s Make-A-Scene of both image and video content, OpenAI’s DALL-E 2 elaborated on below, Midjourney, and Google’s Imagen, among many others.
Text-to-image… but how? 🤔
Accepting that we’ll likely get lost in the weeds, this video breaks DALLE-2’s operating system down pretty well. Essentially: text input, AI works, amazing image (or video) is outputted. Job done.
Diving deeper, there are a few factors at play here:
Contrastive Language Image Pre-training - Clip ♲
A neural network that efficiently learns visual concepts from natural language supervision by doing two things simultaneously:
1️⃣ Training a natural language model - looks at text descriptions and associates this with an image.
2️⃣ Training an image classification model - looks at an image and describes it in a way humans can understand.
Diffusion 📺
Diffusion models train by adding noise (think static noise on a tv) to images until the image is unrecognisable. From there the model works backward to remove the noise and regenerate a similar and realistic image. By doing this, the models learn how to construct images.
Glide: an additional diffusion model that adds a text prompt to the above diffusion process bolstering the accuracy of results.
Aesthetic quality evaluations 😍
Basically mimicking human preferences. Another place where biases come into the model. The model is trained to predict human aesthetic judgement based on different artistic datasets.
All these factors work together to produce the most relevant match to your text input. Let’s pretend to understand it. 🤷🏽♂️
Progress 📈
The two rows of images below were processed based on the exact same text input a year apart, one on DALL-E (from early 2021) and the other on DALL-E 2 / unCLIP (from mid 2022). The difference in detail is mind-blowing. It depicts the impact of CLIP and the diffusion model in the bottom row. For consistent updates in the space see bleedingedge.ai.
Implications 🚩
Imagine talking to someone who learned the wrong word for something and never had anyone correct them. This is how existing biases are built into these models and reinforced. For that to change, datasets need to become more inclusive. Issues are often seen with gender and race-biassed professional representations as mentioned in vol. 4 where I looked at how AI has tackled the HealthTech space.
Beyond this, additional guardrails have been put in place to avoid certain outcomes, attempting to prevent disinformation and the spread of harmful content with ethical ramifications. Furthermore, copyright liabilities and usage rights concerns have been a huge topic with artists feeling their jobs and art forms are in jeopardy. Claude discussed this in vol. 52.
The future 🔮
We’ve been here before, people criticised T-Pain when he popularised auto-tune, claiming it was the end of music as we knew it. It wasn’t. Rather, a new era with new musicians arising and empowered by this new technology. AI is largely the same. With a much larger impact and a few more billions behind in support.
With the rapid pace of growth, and far and wide-reaching impact, the onus is on us as a society in how we shape and use AI. One thing is for certain though, it’s here to stay.
karl
karl thinks this video is a good primer for today’s piece
matt just finished reading a classic - the ride of a lifetime by bob iger
sash was surprised to learn that twitter is hiring - yes, you read that correctly