The Making of an Industry: The Rise of AI Engineering, with Swyx

Shawn Wang (Swyx) (00:00):
What you want is a stack. A shared set of problems that everybody has, that defines the industry. It's not one tool. I'm not defined as an engineer, but I use Stripe. But if everybody in building uses Stripe plus Zuora plus Lago plus Amberflo, and everyone in this industry always talks about these four things, that becomes its own community and its own industry. What industry really is is community with some money floating around.

Alex Booker (00:31):
That was Shawn Wang, better known as Swyx. I wanted to speak with Shawn because he wrote a famous post called The Rise of the AI Engineer, that describes a new type of front end specialization using AI foundation models to enable features and applications that weren't really possible for most companies or developers to build before. His hypothesis, if you like, is that the demand for front end developers knowledgeable about foundation models is greater than the supply, and this trend will continue. It's super relevant to the Scrimba Podcast and anyone looking to land a job in tech, because that could be an opportunity to niche down and stand out in your job search. I have to say, Shawn is extremely knowledgeable about these subjects, and listening back to this episode with headphones on, I almost felt like I was wired into Shawn's brain, downloading a part of his encyclopedia of knowledge on AI and where the industry is going. I'm your host, Alex Booker, and without any further ado, here's Shawn.

Shawn Wang (Swyx) (01:32):
The AI engineer in my definition is a software engineer who is building with AI, not necessarily being a researcher or an ML engineer, but knowing just enough AI concepts and limitations to put things into production applications. And there's a longer thesis behind this about why this new role is rising, but I've seen patterns of this play out time and time again with the front end engineer, with the DevOps engineer, with the data engineer. And because of the rise of foundation models, because of the rise of foundation model labs that only serve black box APIs, which can still do so much, because of the relative demand and supply of ML engineer talents versus software engineering talents and products that people want to build with AI, the thesis is that more and more engineers, like software engineers, will be embracing and building with AI. That stack is very thin right now. A lot of people are building simple ChatGPT wrappers, but over time that stack will increase, and that will give rise to the need for specialists who understand that stack and know how to wield it well.

Alex Booker (02:32):
You wrote a post entitled The Rise of the AI Engineer. I remember that post really took off trending on Hacker News and social media. Did the reception surprise you at all?

Shawn Wang (Swyx) (02:45):
You never really anticipate something to be a big hit. I think I knew that this one had a chance of doing very well, so obviously I put more effort into this one than my typical writing and typical work. But I didn't expect, for example, that Andrej Karpathy would quite tweet it with his endorsement, that he did expect that there will be more AI engineers than ML engineers. And that was a big surprise, because I had no idea that he was reading or cared about anything like this. And anything on Hacker News is a nice bonus.

(03:14):
But really, I was just writing to express a trend that I was already seeing. I wasn't trying to coin a new phrase, as they put it. I was already seeing that people are hiring for these roles, people want to become more proficient in these roles, and oftentimes what a job title is, it's just a Schelling point or shorthand for employers and the talent market to coalesce on a shorthand description of what some title is, where you have a baseline expectation that everyone knows the same things.

Alex Booker (03:44):
In that post, you are attributing or connecting a title to those characteristics, which we'll get into. But I think the other thing the post is doing, which possibly is one reason it was so successful, is that everybody, even your mom or dad potentially, were learning about and seeing the power of UI based tools like ChatGPT. And for developers, we were looking at Copilot, and we were using these tools to help us generate code or sometimes debug code. I think that was my association when I thought about coding and I thought about AI. What your post shone a light on, I think, is maybe taking that one step further to leverage these foundation models in our applications, using APIs like those provided by OpenAI, to build certain features within our applications.

Shawn Wang (Swyx) (04:33):
I call this maybe the three stage progression of an AI engineer. I called this out in my opening talk at the AI Engineer Summit that I also organized. Most people start off as AI enhanced engineers, they use AI products to improve their own productivity. Then they progress towards AI products engineers, where they work on AI products, where they wield AI APIs to expose them to end users. And then finally you have the AI agents, where you effectively delegate your work to an agent to execute. And that part is the most speculative piece yet, but if you're a 10x engineer as an AI enhanced engineer, and then you're a 10x engineer as an AI products engineer and then you are also 10x-ing in terms of the amount of work that you can farm out to your agents, it's quite actually conceivable that you become a 1,000x AI engineer. That is a way of thinking about the level of productivity gain if you embrace AI well, and obviously it's aspirational right now, but it's not impossible. And that is the very, very disruptive nature that basically intelligence on demand can bring.

Alex Booker (05:36):
I have to bite my tongue a little bit here, because I have so many questions and things to say on that subject, but I need to make sure we cover the fundamentals first. It was only a couple of weeks ago that I was speaking to Bob Ziroll from Scrimba about agents, and he gave the example of a travel agent agent, and how the agent can create a plan, and they can reach out to the outside world for information to take actions based on that plan. And we got pondering a little bit, okay, for a support agent that is conceivable. For a travel agent, we get the idea, you can book flights for example. But what if it was a coding agent, or even a chief technology officer agent, reasoning and making plans based on technology decisions?

Shawn Wang (Swyx) (06:21):
It's more likely, based on the existing research, that you're going to have systems of agents, multi-agent systems talking with each other, collaborating to where to go, than you are going to have single agents. Or you can sort of abstract the single agents, and that agent can spawn other agents inside of it, and it may not be transparent to you. But that is the future that everyone kind of wants, especially if you're not too concerned about the safety implications of letting a bunch of AI bots run loose on the internet.

(06:48):
That's not actually the concern right now. The real concern is that they just don't work very well, and a lot of agent companies have been started in San Francisco, and most of them will fail completely. And that's just the nature of these things, that expectations are extremely high. I try to remind people that there's a lot of hype right now, and there will come a time when AI winter will return. AI is prone to these very extreme cycles of summers and winters, and [inaudible 00:07:14] cycles, and this one should probably be no different.

(07:17):
I do think that there's some baseline expectation that we have AI infused into everything in our applications right now. Microsoft is actually going extremely hard on this, putting it all over their operating system and search engine and browser, but agents are basically the sketchiest part of AI, if I could put it that way. They don't really work yet. We really want that to work. A lot of people claim that they have a demo on Twitter that works somewhere, and they find that it's extremely cherry-picked. So, it is a hard problem, interacting with the real world, and people are working on it. And it might be tomorrow somebody solves it, or it might be 30 years from now, and we don't actually know yet.

Alex Booker (07:50):
Let's bring this back for a second to AI engineering and what's a bit more immediate. I know for people listening, when they think about AI or leveraging AI in their applications, there is still this notion perhaps that that is something that is reserved for ML engineers or AI researchers potentially, which are very like, in my view, hardcore specializations. What's changed exactly that makes this capability available to average developers like myself?

Shawn Wang (Swyx) (08:20):
There are multiple things that changed simultaneously, but the biggest factor to change is the rise of foundation models. Previously in traditional software 2.0 style machine learning, you would have... When you work in a large company, you have a ton of user data, like a Netflix for example, and then you would hire a team of ML engineers and researchers, and they would try to improve your recommendation systems by 5% or whatever. And that is very company specific, that is very domain specific. You would train a machine learning model to do one task, which is, let's just say, I will recommend Netflix shows 5% better than I used to. And that's very valuable, and honestly that is still the large scale volume of machine learning work today.

(09:03):
But incrementally onto that, what we have now is foundation models that are just generally capable of doing a lot of different things all in one model, including things that they weren't specifically trained to do. That is what people call in-context learning, few-shot learning, transfer learning, anything like that, that you may have seen. That is that domain. And what effectively you can do with that is, you do not have to train the model yourself. You can just take something off the shelf, whether it's open source or whether it's closed source, doesn't really matter, and then you add some prompts to it, you wire up some APIs, you make it generate some function-calling output or JSON output or code output, execute that code, or generate some UI in response to that code, and now you have an AI-enabled application.

(09:44):
And that is functionally very, very different, because the machine learning researchers and ML engineers do not have any special expertise on that domain. That is the domain of building products, and that's where the software engineers come in, and can run it through to the rest. And really, actually the assertion is that you have to be technical. Relevant to the Scrimba Podcast, teaching people to learn to code, and figuring out the careers.

(10:06):
One of the reason I settled on this AI engineer moniker is an assertion or a realization that the people who don't code are going to be left so far behind in this AI wave, because they can only consume. They cannot shape these AIs to their wishes, because they'll only be downstream of the people who are building these products, aka, the AI engineer. So we have a huge amount of responsibility, but a huge amount of power, too. And I like that as a framing concept for why it's valuable to, one, learn to code, two, learn to code AI apps. And it's all possible now in a way that it was not possible even three years ago. And that is a huge, huge change, to the mindset that this is probably going to be bigger than the ML engineering industry because of the sheer amount of demand unlocked by foundation models, is a very big change in mindset.

(10:53):
And I have to say, for people who are not that familiar with foundation models, I can go into more detail, but I have to say that we're not done here in terms of the innovations of foundation models. This time last year we were all exposed to the idea of ChatGPT. This year we're exposed to the idea of GPT-4 and Gemini and Llama and all these other state-of-the-art models. It is going to increasingly proliferate, it's going to get a lot better. We have clear line of sight to models that are at least 100 times better than they are today, and we don't yet know what to do today's capabilities. So, there's just a lot of room, a lot of blue ocean for people to come in and explore.

Alex Booker (11:27):
I was listening to a podcast between Lex Fridman and Jeff Bezos, when Jeff Bezos said something quite interesting, which is that we didn't invent these foundation models, we discovered them. And I think the distinction there was that these foundation models are still surprising us in their capabilities, like they're doing things that we didn't exactly pre-program them to do. What do you think about that sentiment?

Shawn Wang (Swyx) (11:53):
Yeah. That gets in the realm of philosophy for me. And sorry, one thing I do try to avoid. A lot of AI conversations tend to devolve towards politics, philosophy, ethics, that kind of stuff, law, regulation. And I try to focus on engineering. Let's talk about things we can build, and not philosophize too much about, is this thing conscious? Or, what happens to humanity when Skynet takes over? That tends to be something that engineers don't really have much domain over. It's not really falsifiable in any sense. So, you can talk all day long with no resolution, and that's not really a good use of time in my mind.

(12:25):
I will say that there is a distinction between software 1.0 and 2.0. There's a very classic essay that everyone should read called Software 2.0 by Andrej Karpathy. It was written six years ago now, and basically this explains the distinction between traditional software, which is manually hand coded. Every single line of code is hand coded by a human, with if statements and loop statements and branching statements and modules and functions and all that good stuff. That is classical, deterministic software. And then Software 2.0 is more or less a machine learning architecture that you run a whole bunch of data through it, and it learns to approximate the functions that it needs to do, to do the task that the data says to do.

(13:05):
There's a theorem called the universal function approximation theorem that basically says, you can approximate any function, any function that you can manually write, I can approximate it if I have a machine learning architecture that fully describes it, and I run enough data through it to learn all the ways that I need to model the code that you would have written manually.

(13:21):
So, that is all just true if you think about it enough. What is incrementally new is the concept of software 3.0, where instead of running machine learning for a single domain, you are running machine learning across all the domains and creating foundation models.

Alex Booker (13:36):
And this foundation model is also called a general model, for that same reason.

Shawn Wang (Swyx) (13:40):
Very few people actually say general model. It is a general model, yes. It's not really in the common parlance as far as people who are in this field talk about them, but yeah, it's a foundation model. It's a term coined by Stanford, Percy Liang, and you can read more... He has a paper on coining the term foundation models, and now all of Stanford is aligned behind this concept, and it's pretty much caught on. There's another term, frontier models, which just means the best foundation models, and those are subject to some regulation, just because it's at the level where nation states start caring about the power that these individual models can have.

Alex Booker (14:14):
I like your idea to keep it quite practical and based on actionable things we as developers can do.

Shawn Wang (Swyx) (14:20):
[inaudible 00:14:21] with one thing, it's not that I don't care, it's just that it's very hard to prove. But I do want to anchor people to some data points. If you believe in evolution, you have to believe that, to some extent, we are evolving a new intelligence, in the same way that we ourselves evolved. We are perhaps doing it differently than we ourselves evolved, but it's very much the same way, like trial and error, responding to environmental feedback, and then progressing in some kind of pace. There's every sign that this thing is evolving much faster than us, so we should be concerned, but also we can take an estimate of the amount of compute they can take.

(14:55):
So, I'm just going to leave some breadcrumbs for people who are interested. There is a person called Ajeya Cotra who has done research on timelines to transformative AI, or in other words AGI, and how long it took human intelligence, the biological basis of human AGI, and comparing that to foundation model intelligence. Rough TLDR is that GPT-4 took 10 to the 25 flops of compute, and she estimates that it takes something to the order of 12 to 13 higher orders of magnitude for humans to evolve. So, at some point in the next century, we will get there. We will have evolved these things about the same amount of time and flops it took for ourselves to evolve. So, I'll leave it there.

Alex Booker (15:36):
I heard that to produce a small LLM, you would take something like 10 terabytes of internet data that you scrape, you train on that by running something like 6,000 GPUs for 12 days, and that would cost something like $2 million. That's wild, especially when you think about the order of magnitude required to build a bigger model and go even further, as you describe.

(16:00):
But I think the real key point there is that... Because if we go back to the original question around the difference between AI researchers and ML engineers and what we're talking about today, is really that they are the ones working within these research labs like OpenAI to produce the models. They have the expertise, they spend all this money on computing power to take us further, and then you tell me if I'm understanding well, but where the AI engine there comes into things is that those companies or teams will chuck the model over the API wall, at which point we have quite an approachable tool to work with to build AI powered capabilities into our applications.

Shawn Wang (Swyx) (16:37):
Yeah. I think that's true. So, everyone listening should have the visual in their head of the spectrum between research engineer and traditional software engineer, and all the roles in between. But the API wall is a pretty clear line, where either you have an in-house ML team serving you an API, or you have a third party foundation model lab serving you an API. It doesn't really matter once you get that API what you do with it. That's the domain of the software engineer, and the AI engineer specializes in AI stuff.

Alex Booker (17:04):
I want to challenge you about something that I don't have a strong perspective on, but I do see in conversations about the title, or the moniker as you describe it, AI engineer. This comes up quite a lot. Because what we just spoke about now is interfacing with an API, but if I worked with Stripe or PayPal APIs, that wouldn't make me a payments engineer necessarily. That would make me a full stack engineer experienced in a particular domain, like working with payment providers, for example. What makes AI engineer a specialization, in a sense?

Shawn Wang (Swyx) (17:40):
That's a very, very good pushback. I loved it. It forces me to explain a little bit more about why I think some industries form, and why I think some industries are just skillsets and not full industries. So, I'll contrast this with... If you use Databricks in Snowflake, if you use Airflow, you are a data engineer. If you use Terraform, if you use Kubernetes, you're probably a DevOps person or a sysadmin.

Alex Booker (18:04):
Yeah. Yeah, that's a good point.

Shawn Wang (Swyx) (18:05):
If you use React, you're probably a front end person, so on and so forth. Why does Stripe not become synonymous with billing engineer, or payments engineer? I gave it away. Yes, there is a term for this. It's called billing engineer. We had them at my previous company. They all hate their jobs, and Stripe makes it a little bit easier. There's an emerging industry for billing engineers that people are trying to coin, and more power to them.

(18:26):
But I will say, effectively what you want is a stack emerging that everybody uses, a shared set of problems that everybody has, that defines the industry. It's not one tool. I'm not defined as an engineer by I use Stripe. I use Stripe to do a task. But if everybody in billing uses Stripe plus Zuora plus Lago plus Amberflo, and everyone in this industry always talks about these four things, that becomes its own community and its own industry. What industry really is is community with some money flowing around.

(18:58):
And I think that's what happened with DevOps, that's what happens [inaudible 00:19:01] with front end, that's what happens with data engineering, and I think that's what happening with AI. It's not just that I use OpenAI APIs, therefore I'm an AI engineer. It is, I use OpenAI APIs, and I know all the recent papers that are to do with chain of thought prompting, and I know all the trade-offs to do with vector databases, and I know all the trade-offs to do with fine-tuning open source models and hosting them and creating fallbacks, and I know LangChain versus LlamaIndex. I'm conversant in all these things.

Alex Booker (19:27):
Can I speculate something else as well, Shawn? Which is, using these systems in production as well, like doing things at scale, potentially.

Shawn Wang (Swyx) (19:34):
A lot of people are not there yet. A lot of people are just proficient, they're not experienced. And they will get there. And that's why they're going to butt heads with the ML engineers, because it's not a sharp line with AI engineer versus ML engineer. ML engineers have been in this game a lot longer than AI engineers, and we should respect that. But they don't have monopoly on knowing how to fine-tune open source models, because this is new to them too. We all started from the same exact place and time. So, it's open season as far as scaling these up and serving them in production.

(20:05):
But I will say that a lot of ML engineers effectively productize themselves. So, this will be the people like Lightning Labs, this will be the people like fireworks.ai who are the ex-PyTorch team, [inaudible 00:20:16] AI. There's a bunch of folks who are basically very happy to help you go from serving a few customers, serving MVP, towards scaling it up. That is actually increasingly commoditized skillset, because it is just commodity infrastructure. What is still the domain of product engineering is, all right, you have an inference API that serves the table stakes amount of tokens per second, and can scale up and down serverlessly all well and good. How do you make that useful? How do you make people pay for it? That is still the domain of the product engineer.

Alex Booker (20:48):
What is an inference API?

Shawn Wang (Swyx) (20:50):
Oh, inference API meaning, serving these LLMs in production. Another term for that is inference instead of training. Inference is just running the models in a forward only fashion where you're no longer updating the weights, you're just trying to read the outputs of the weights based on some prompt that you give it, and then using the APIs.

Alex Booker (21:09):
What are weights and the role of weights? Because I see them associated with foundation models all the time, and people talk about open sourcing weights and things like that.

Shawn Wang (Swyx) (21:19):
You can think about weights actually as a form of config file. Usually you would have a program and then a config file that configures that program. Like you have a bunch of flags that would run the program in one way, and then you change the flags, the program will run a different way. What weights effectively are are the output of the training process, all the settings that can run the program in all the different ways. Imagine if every single line of code was a different config, effectively. It is the most extreme config file in the world. All the configs run from zero to one in 4, 8, 16 bit precision. It doesn't super matter that much.

(21:56):
But what you need to know, I guess, is that these are not human readable config files. These are just billions of parameters of just numbers. And when you train these things, you're basically running a whole bunch of data through the code, and that outputs a config, which we call the weights, and the more you train it, the more that config updates to try to match the data that's been fed through it, which is brilliant. It is not super obvious, but it's brilliant.

(22:20):
And that means you can also share that config file with other people, and once you have the code and the weights, people can also just start running their program based on the work that you already did to train it. And yes, it takes millions of dollars to train these things. Reports are that GPT-4 took $500 million to train them. If anything is open source like Llama 2 is open source, or Mistral, these things probably took on the order of 10 to $20 million to train, and you get to use all those things for free on your laptop, which is pretty incredible.

Alex Booker (22:50):
Is it sometimes referred to as Llama 2 70B? Is that like 70 billion parameters?

Shawn Wang (Swyx) (22:55):
Yes.

Alex Booker (22:56):
Wow. And so I like this abstraction you're sharing about it being a config file. I suppose the perspective on that is that it's a 100 gigabyte plus config file, it's enormous and impossible for a person to pass, but within that is the information necessary for the algorithm to do what it needs to do.

Shawn Wang (Swyx) (23:14):
Yeah. And there's actually probably a lot of completely irrelevant numbers in there. Numbers that are close to zero, probably irrelevant, because zero multiplied by anything is zero, But we just don't know how to strip them out usefully yet. There's a lot of research into model sparsity, none of which has gone anywhere, and the best that we know how is to train smaller models for longer. That's the most promising research direction so far.

(23:36):
I will say that it's not just that humans cannot parse these things, it's also that single machines cannot parse these things. So, when it comes to the 70B models, here's a very nice equivalence. The amount of memory it takes to run a model is roughly two times the number of parameters. So, a 7B model takes 14 gigs of memory to run, and you can think about this on the order of bits. If every parameter in the model is 16 bits, this a 16-bit model, and one byte is eight bits, then a 7B parameter model takes 14 gigabytes to just store, to just represent in memory. Therefore, you have to have at least 14 gigs of memory. Then you compare that to how much memory your laptop has, how much memory a single GPU card has, and how much memory a network of GPU cards have. And that's why, beyond a certain scale, you actually need to start to do distributed systems, to network these GPUs together, just to run these things.

Alex Booker (24:27):
Wow. That's wild. We're not going to have enough time to get into it today, but I suppose one thing that's very relevant and practical, what you're describing, is this idea that when we use ChatGPT, we're sending our message off to a server, and we're seeing the results streamed back to us, a couple of tokens at a time potentially, that's running on quite a powerful server. It depends what model you're using, whether it's 3.5 Turbo or 4, for example, but it's happening on someone else's computer, and that means that if you want to use the underlying model in your own application, you're probably going to have to run it on a powerful server as well, maybe OpenAI's through their APIs.

(25:03):
But there is this, I guess, exploration and maybe some success in producing smaller, more specialized foundation models that we can apply your rough metric to, around how much memory they take to run. There are some models which we can run on our M2 MacBooks. There are some which can even run in the browser apparently, that are going to have to be more specialized because they're smaller, but they require a lot less processing power to produce something useful.

Shawn Wang (Swyx) (25:30):
Yeah. There's a lot of interest in small models. I, in fact, named my company Smol AI in recognition of that fact.

Alex Booker (25:36):
I didn't realize that was the link.

Shawn Wang (Swyx) (25:37):
I haven't published our small models yet, but yes, that is absolutely the direction that things are going for some amount of interest. And I just want to caution people against getting too excited about running models locally. I do think that people are wasting a lot of money buying their own GPUs and setting it up in their rooms and offices and stuff like that. It just feels very Neanderthal. No-one would recommend you to buy servers and rack them inside of your office. We use the cloud.

Alex Booker (26:04):
DHH would.

Shawn Wang (Swyx) (26:05):
Yes, DHH would, but he's very special. Most people should use the cloud. This is not a foreign concept. We've been through this. So, why all of a sudden are people so interested in running models locally on a GPU that they buy? It's just vanity. It's just bragging rights. It's cheapness.

Alex Booker (26:21):
It's also cheaper, and maybe this is a great justification for the AI engineer.

Shawn Wang (Swyx) (26:26):
Is it cheaper?

Alex Booker (26:27):
Well, this is it. There's a lot of techniques required to optimize the number of tokens in order to reduce API costs, and that is an argument for the AI engine there. That in itself could be something that you develop a depth of knowledge on. But it is basically understood that using an API or a cloud provider is going to be... Yeah. We get into this build versus buy, capex versus opex debate.

Shawn Wang (Swyx) (26:52):
Exactly. It can be cheaper if you know how to use it right. I am not disputing that at all. But most people are not using it right. Most people are buying a expensive GPU, showing it off in their social media, and then using it a few times a day or a week or whatever, and massively overpaying for compute that they could have rented for a dollar an hour, for something that they don't have to own, they don't have to maintain, and it's not going to depreciate on them. You buy a state-of-the-art GPU today, it's going to last you two years. If you depreciate the cost of that value over those two years, you're going to find that actually just renting is a lot cheaper, especially if you don't use it 24/7, which you're probably not going to.

Alex Booker (27:26):
Let's shift gears a little bit and talk about the career implications of the AI engineer. One thing you mentioned in your post, and I heard you mention it again today, is this idea that there is a greater, or there will be a greater demand for AI engineers than there are a supply. And if you're a developer, a full stack developer or a front end developer and you can specialize in this way, it's very good to be in a job market where there's more demand than supply. But tell me more where this idea comes from, that the demand is greater than the supply.

Shawn Wang (Swyx) (27:58):
Demand versus supply is definitely a theory more than a data thing. I do have a lot of anecdotal data points around people approaching me to try to hire these engineers, and then engineers approaching me trying to scale up on these data sets, and all I have to do is just create a place for them to meet. And the minimum viable place is a phrase or a job title that everyone can agree on.

(28:18):
The demand versus supply is just purely based on having an economics background and thinking about the pipeline for an ML engineer or research engineer. Typically for research scientists, you'll probably need a PhD. The country only graduates a few thousand of these a year. Maybe 20% of them are actually good, and that's the hiring pool. What?

Alex Booker (28:39):
Just the maybe 20% of them might be good.

Shawn Wang (Swyx) (28:42):
There's a lot of terrible work out there. Let's not kid ourselves. I think part of being real is understanding that a lot of people are not very serious about their work, or don't have good judgment on what to work on, and then finally they don't work hard on it. So, I think being realistic about the kind of people that you want to hire, it's not a big pool. But I will say, one reason that high quality talents... There's only 600 people at OpenAI, there's only a few hundred people working in all the other foundation model labs put together. The one reason why all these high quality talent wants to congregate is they get better utilized. Because if I, as a high quality PhD or whatever, I get put into sales force, I'm not going to be utilized to my full extent. But if I go into a foundation model lab, I work on a state-of-the-art frontier model, and [inaudible 00:29:26] serve as an API, now tens of thousands of companies can use my work.

Alex Booker (29:28):
Right. It's leverage.

Shawn Wang (Swyx) (29:30):
It's leverage. But that also means that there just will be a lot more engineers on the other side of the fence using these things. So mathematically, there should be more demand for consuming these APIs than there is supply for the APIs.

Alex Booker (29:40):
What is it that companies are trying to build? What kind of apps, what kind of features are they trying to build that they need AI engineers, they need to hire people with this skillset in order to build? Is it like the Netflix or YouTube recommendation engine type of thing

Shawn Wang (Swyx) (29:56):
So, I would warn AI engineers to stay very far away from that kind of traditional ML, because you don't know what you're doing. This kind of large scale recommendation systems and classical machine learning stuff is the domain of very, very specialized knowledge. So, you should have a healthy respect for this, instead of wading in with your fancy LLM and making a complete fool of yourself.

(30:16):
What is more likely to happen is that there are net new capabilities that are exposed by foundation models that the traditional ML engineers have not thought about, because it just wasn't possible five years ago. So, it would be things like Midjourney, which is making $300 million a year with a team of 15 people, because they generate images. This is not something that people tried. I mean, people had GANs before, but it's not something that was good until recently.

Alex Booker (30:39):
Do Midjourney produce their own model, or do they use someone else's?

Shawn Wang (Swyx) (30:42):
They produce their own. They're also looking for AI engineers. I've actually talked with them about this.

Alex Booker (30:47):
Okay. I guess maybe a company like Jasper AI, used for producing articles and written content and stuff, that's maybe an example of something you could build using an existing foundation model, rather than building your own, right?

Shawn Wang (Swyx) (31:00):
Jasper used to be a good example until they had to go through a round of layoffs, presumably because they were the copywriting tool before ChatGPT, and then ChatGPT took away a bunch of their business. That's just an unfortunate reality of where these things are. The modern Jasper would be Harvey, which is a legal Copilot for lawyers, and they're just wrapping the OpenAI API, and they've just specialized in the UX in their domain-specific retrieval-augmented generation, the RAG stack, for lawyers. And as long as they can make that useful, they've raised a huge amount of money, they're extremely successful in that domain. And you can see a lot of people like that.

(31:34):
The reason I went to images is because that is a clear example of something that the traditional ML engineers would not touch, and so therefore you have a comparative advantage. And Midjourney is probably not a good example, but here we should probably talk about [inaudible 00:31:48] AI or Interior AI, both of which are [inaudible 00:31:51] apps. Those are making multiple millions of dollars a year, by just wrapping Stable Diffusion. And that is very much the domain of AI engineering.

Alex Booker (31:58):
Yeah. That's really cool. And it's really good to get a sense of... Because these show us what's possible, and as these companies achieve success, they're going to inspire other companies to leverage similar patterns or build similar features in different domains. Harvey's for legal, but it's not inconceivable to imagine, let's just call it Mike for accounting firms, for example, doing something similar in a different domain.

Shawn Wang (Swyx) (32:22):
I can give you a couple others. So, Cursor is a fork of VSCode that has better integrations with GPT-4. So they don't train their own models, but they just transparently wrap GPT-4 for you, and they're doing extremely well as well, just in terms of revenue, just to give you an idea, it's the IDE or it's the VSCode flavor that OpenAI themselves now use, which is just wild to me that you can build something and give it back to the company that you're consuming from.

(32:49):
And then the other one that has come up recently is Perplexity, which is a new search engine that's basically trying to take on Google, which is pretty crazy to me. They have reached 10% of Bing traffic, which itself... Bing itself would be worth like 10, $20 billion just based on the advertising revenue that they have alone. They have reached a million users on mobile phones, and it's a small team where they don't train their models. They've recently fine-tuned their own models, they fine-tuned Llama 2 to do some online search stuff. But apart from that, there are a new search engine enabled by LLMs, pure AI engineering effort, because they don't train their own models.

Alex Booker (33:27):
That's insane. I'll throw a couple more in the mix for the benefit of people listening. Notion AI seems pretty cool. They just released a thing, Shawn, where... I don't know how they do it, but you can ask your own Notion account questions. So, I don't know if it's like a RAG thing in it, but you can ask questions about anything you've written about in Notion, and it's got an awareness of that, so maybe point you in the direction of a question, or in the knowledge base to some particular segment.

(33:52):
dScryb is another tool that I love very much as a podcaster. I don't know what they use under never hood to be honest, but because the way that works is you're editing a video by editing the transcript, it's very compatible with an LLM. You give it an input like remove ums and ahs, and they can do that for you, or remove gaps and they can do that for you, because the interview is represented in a text form.

(34:12):
And then tools like Loom and maybe Gong and Jiminny and those sales tools where you record sessions with customers, they're leveraging these tools I think to do some degree of classification, like helping someone navigate a transcript by classifying what's going on at what point in the conversation, but also taking action, like trimming a video for example. These are like superpowers that I reckon wouldn't be worth them building from scratch necessarily, but if they can leverage a foundation model to add these superpowers to their applications, it seems like a very productive way to go about it, that the end users benefit from, therefore they use the product more, they upgrade their plan, they make more money.

Shawn Wang (Swyx) (34:51):
Yeah. Great list of examples.

Alex Booker (34:53):
The last thing I wanted to ask you, Shawn, just in closing, because when we spoke a year or so ago, we were speaking all about career advice for new developers. If you've made it this far in the episode and you've enjoyed listening to Shawn talk, I know you're going to love that episode as well. It's one of my favorites. But since we're talking about AI now, I was going to ask you, if you were to add a chapter to the Coding Career Handbook about AI engineering, what would you write about?

Shawn Wang (Swyx) (35:18):
I'm actually going on a writing retreat next week where I will be adding a chapter, so this is actually [inaudible 00:35:22]. So the main thesis, the last time that we talked, is learning in public, and that still applies. Everything that I'm doing right now is still learning in public, and I don't think that should change. I actually also think that AI engineering is 90% traditional software engineering, and you should learn all the software engineering fundamentals before you tackle the AI stuff.

(35:41):
I would say probably the main thing that you have to do that is net new is to be very focused, because there's just way too much for any single person to understand, and trying to keep on top of it all makes you a jack-of-all-trades and a master of none, and that's not very valuable because there's 1,000 jack-of-all-trades out there, all of which can't do anything real when you really push them to.

(36:02):
So, picking a domain, picking a subfield where you've decided to go deep on something, learning in public about that, connecting with all the experts in your field around that, building real projects, all those are timeless principles that apply no matter what domain you're pursuing.

Alex Booker (36:18):
Shawn, thank you so much.

Shawn Wang (Swyx) (36:20):
Yeah. Thanks very much for having me on, and I hope everyone can check out some of the writings that I've done, and now the podcast that I now run. Since the last time we talked, I now run a podcast. And we also have a small little course that we're working on, [inaudible 00:36:32] University. You can check that out too.

Alex Booker (36:34):
We'll link it high and proud in the show notes. Thanks again.

Shawn Wang (Swyx) (36:36):
Thanks, Alex.

Jan Arsenovic (36:38):
That was the Scrimba Podcast. Check out the show notes for all the ways to connect with Swyx. If you made it this far, please subscribe, and if you're enjoying our show and you want to make sure we get to make more of it, the best thing you can do is tell somebody about it. You can do it in person, on Discord, or on social media, and if you post about us on Twitter or LinkedIn, you'll get a shout-out on the show. The Scrimba Podcast is hosted by Alex Booker and produced by me, Jan Arsenovic. Keep coding, and we'll see you next time.

The Making of an Industry: The Rise of AI Engineering, with Swyx
Broadcast by