Demystifying AI: What Are Foundation Models (and How to Use Them), with Tom Chant

Tom Chant (00:00):
One thing that I find really, really intriguing here is that the scientists at OpenAI don't actually know how and why this works. That cuts us some slack that we're not going to understand it either. The world of web dev has always moved pretty fast. This is definitely a new thing to learn. It is another thing to go on the list and quite a big thing, but it is definitely more revolutionary.

Alex Booker (00:24):
That was Tom Chant, developer and teacher at Scrimba. I wanted to talk with Tom because he's experienced at using AI foundation models to build features in front-end applications that weren't possible to build before, unless you were a big company with lots of resources. You're listening to part two of our Rapid Response series on how to become an AI engineer.

Last week, I interviewed Scrimba's co-founder, Per, about what an AI engineer is and he shared a roadmap on how to become one. Today, we're putting a finer point on foundation models like GPT or Llama, which you might've heard of. Tom's going to define foundation models before giving you a tour of which ones are out there, how they work, and what exactly are these features they enable that weren't previously realistic for average developers to implement.

There's a lot to look forward to in today's episode. Tom really knows his stuff and how to distill it in a simple and inspiring way. But for me, the big idea here is that unlike, for example, a new front-end framework, which gives a different developer experience to achieve more or less the same thing, I'm talking Angular versus React here, foundation models are beginning to fundamentally change the features and user experience of front-end applications.

As front-end developers, we need to be very cognizant of that because it's up to us to build those features. And companies are starting to recognize this opportunity as well, increasing the demand for developers who are familiar with these technologies. This isn't yet another technology or some shiny new thing. We're experiencing a fundamental shift here, hence the Rapid Response series and my conversation today with Tom. Tom, welcome to the show.

Tom Chant (02:14):
It is good to be here. Thank you very much for having me. I watched your episode with Per, actually, last week. So that gives us a really good place to start from. I mean he touched on a lot of topics, so we've got quite a lot to talk about.

Alex Booker (02:27):
Yeah, that's right. After speaking with Per last week, we got really excited about AI engineering. And I think what Per did, which was excellent ... And anybody listening, I would recommend you start with episode one in this series. We're creating a four-part series here to help you get up to speed with AI engineering and realize why it's so exciting and something that is very productive to focus on to improve your employability as a developer.

I suppose Per gave a high level overview, whereas with your experience creating courses and building apps with these AI technologies, it's going to be really fantastic to hear your perspectives about the specifics, like how this stuff works and what kind of features it can enable within our applications.

And then we'll segue this conversation into a discussion about foundation, AKA general models, what they are, how they work, and the different types of foundation models out there, plus how to utilize them.

Tom Chant (03:15):
Absolutely, yeah, because it is kind of crazy. I mean, we're in December now, so it was a year ago that I didn't really know anything about AI. Or what I did know, I was very skeptical of, because over the previous decade probably and more, we've heard loads about AI, but anytime you've actually seen something claiming to be AI, normally some kind of chatbot on a website, it was hopeless. It was worse than rubbish. It was actually kind of like an annoying feature they should have just got rid of rather than stuck there on their website.

But a year, I guess, is a heck of a long time in AI and in that year, I've actually released three courses on AI. I'm a big convert to it to be honest. It's performing vastly better than I ever thought it would. So hopefully what we can do is drill down into some of Per's points and get into some specifics.

Alex Booker (04:06):
Absolutely. I like how you mentioned that a year ago you didn't know much or anything about this stuff. Nobody really did. We celebrated ChatGPT's first birthday not too long ago, and that was only really I think the impetus that revealed the APIs to these foundation models to developers at large. And there was this proliferation from there of open source projects and methods we can use as developers.

It's a very young industry, but I think what's different compared to Web3 or NFTs or crypto and those kinds of things, which were also new and exciting at one point, is that there's a really clear connection between the stuff we're talking about in this series, AKA foundation models and using AI companies like OpenAI to build features into their applications. There's a very direct link between those things and actually delivering value to end users. I think that's what makes us believe in it.

What are some of the applications of AI you've seen in the last year or so that have made you believe in it as something that's ... Well, you described yourself as a convert. What converted you?

Tom Chant (05:11):
I think that's absolutely right. The fact is we're now actually seeing practical applications of it which are actually useful and that's something that we never saw before. You see it just popping up in lots and lots of places. I mean one really obvious one is in these home assistants, things like Alexa, also Siri and what you've got going on in your smartphone.

Those have changed in an evolutionary way and some people that haven't really changed how they use them might not even notice the difference so much. But the fact of the matter is, they've become more conversational. They've become more logical. They've become more context-aware. When you look at something like Alexa, it can now respond to multiple queries at once. It can be much more intuitive. It can actually connect to outside sources and bring in much more information than it could.

And also, the specifics of how it interacts with you, it can pick up on eye contact, it can pick up on the tone of your voice to decide if you're talking to Alexa or if you're talking to somebody else. So there's loads of changes just going on there, which are gradually percolating through.

Now, before I started working at Scrimba, I spent a lot of time in the field of education but nothing to do with code. I was actually an English teacher. And it's really interesting to see how AI is influencing education because there's a big need there. There's a lot you can do.

And a couple of apps have really come to my attention, and one in particular is this app called ELSA Speak, which basically does one of the most fundamental tasks of language teaching, which is that it prompts and corrects pronunciation. And that's something that's really, really hard to do in a classroom environment where you've got loads and loads of students.

But it's something that's really fantastic that a student can go home and they can use an app and actually get native speaker quality correction on their pronunciation from an app for hours on end. Things like that are quite a big game changer.

Alex Booker (07:09):
They say the best way to learn a foreign language is to take a lover in that other language because you converse naturally and intimately I suppose. And by the same token, you can travel in the country. And compared to doing Duolingo or something, it's only by putting yourself in that context that you get to really absorb the language and see what it's like to converse colloquially. But it sounds like this app basically brings that to your phone so you can get a similar experience.

Tom Chant (07:37):
Absolutely, yeah, and it's on-demand. You don't actually have to go to the trouble of getting into a relationship from somebody from that language, which honestly could be quite inconvenient in life. You might be married to someone else.

So we are just seeing AI proliferate in all sorts of places. I think one of the most important, one of the most relevant ways that AI is being used is actually in commerce and specifically in promoting products to consumers.

Alex Booker (08:10):
This is huge.

Tom Chant (08:10):
It is huge. It doesn't look huge because people have been trying to do this for a long, long time. Since the beginning of online shopping, things have popped up on your screen saying, "Hey, how about this? How about that?" The difference with AI is that now that can be done extremely well. And what I mean by that is it can be targeted very specifically to the individual consumer.

What that means is that basically companies can persuade their consumers to buy more. And as the bottom line for them is everything as they are money-driven, that's something which is going to just become probably one of the key uses of AI in web development in the coming years.

It's also interesting because I think we need to look at things in terms of not just what's going on in AI at the moment, but also where web developers should be focusing, and particularly, if they're just training to be a web dev at the moment, if they're trying to get their first job, what can they actually do which utilizes AI in their own applications?

When you're putting together a portfolio, obviously you want to show off your skills. Now, you can't go ahead and do something as dramatic perhaps as Amazon are doing with Alexa or making your own Siri or something like that. Maybe you can, or to some extent. But I think there are things which you can do which are very, very effective and which are well within your grasp.

So if you can see that Amazon have got a tailor-made advertising banner with products that they know that consumer is going to like, well, you can copy that concept but apply it to a much smaller field. So you could take, for example, your own blog. You could use AI so that your users can actually read some blog posts and then get very specific, tailor-made recommendations of which other posts they would enjoy. And that is going to increase your page views, keep your users on your site for longer. It might boost your advertising revenue if you've monetized your blog.

But I think the more important point is that gaining those techniques and gaining those experiences actually looks really good if you're trying to break into an industry and you're actually doing something real world, which is really useful.

Alex Booker (10:19):
I think I'm hearing two, maybe three things here actually. The first is that using these new AI technologies, you can build some really innovative businesses like ELSA Speak for example. I think that will appeal to a lot of people who want to build the future with code and that's why they like coding.

But I also like this commercial example you give, a way that these technologies are genuinely going to help every business, not just the big massive businesses like Amazon who can afford to hire ML engineers and AI researchers and spend hundreds of thousands if not millions of dollars on computing power. Every company is going to have access to these technologies.

There is a path to additional incremental revenue there and they want developers to come and help them apply these features. And maybe that's where you listening can come in by learning these features. As a new developer, that could be a nice angle into a company potentially.

I think it is great to think about that commercial angle because every job, even though we want to love our job and we want a job about which we're passionate, it is ultimately a monetary commercial transaction where you get a salary for your effort and you're expected to add revenue to the business, whether that's directly or indirectly. So I think thinking about it in these terms is very relevant to the Scrimba podcast.

And yeah, the third thing you pointed out I think, which is that it's interesting, you can give an example like the advert on Amazon, but it's the underlying idea that can be applied in a multitude of different, maybe creative ways we haven't even thought about yet. I love this idea of being able to recommend very specific blog posts. I've built a blog engine in the past. I've also maintained a few blogs using third party software.

You've captured someone's attention to come and read your blog. That's fantastic, and then you want to recommend them something else to read. A lot of the time you have to do it manually, which is tedious or you base the recommended articles based on tags or matching the title text or something, but that doesn't really speak to the meaning of the message in the post and what path they would likely want to take next according to the user's preferences and viewing habits or reading habits in this case.

I say viewing habits because this is very similar to how the YouTube recommendation engine works, how it tailors your feed based on what you've watched before and what they think you'll like. Tying this together with the commercial angle, it's because the more minutes you watch on YouTube, the more adverts you're going to see.

And that's a very personal experience that they tailor based on your viewing habits, previously using ML and things like that. But it would be possible to add a similar kind of feature to your application using these technologies that are becoming a bit more democratized today in the AI world.

Tom Chant (12:44):
Absolutely, yeah. And you say democratized there, it's actually pretty much revolutionary how much power the loan developer can have in their hands from just a couple of APIs. It really is bringing what used to be the stuff of massive corporations into the palm of your hand really.

Alex Booker (13:03):
It really is revolutionary and therefore a very exciting opportunity for new developers, I think.

I want to put this question to you quite directly because I know I was thinking it and I know that people listening will likely be thinking it as well. We learn about these technologies, like the category of AI engineering, and within that, a plethora of different specific technologies, whether it's an OpenAI API, LangChain or something like that.

Are all these things yet another technology just like TypeScript is yet another technology we have to learn, or test driven development is another concept we have to learn? Maybe it's like SpEL in the sense that this is something that we see people using. We see it's trendy, but we know that we can't go chasing a bunch of rabbits because we won't catch any of them. In that case, we should focus our efforts on one thing.

I'm wondering, this AI engineering stuff we've been talking about with Per, I'm talking with you today about, and we're teaching in the AI path at Scrimba, is it something similar in category to TypeScript, not in technology, but in the sense that it's one more thing you have to learn?

Or do you think it represents more of a revolutionary shift similar to how if you look at technology over the last 20, 30 years, there are concepts and patterns of building software that are totally antiquated now? And maybe if you weren't paying attention you'd be left behind and today you're the person coding Pascal on a mainframe computer instead of deploying to the cloud and using a single page application? In what camp does it fall? Is it the one more technology camp or is it more of the revolutionary shift kind of camp?

Tom Chant (14:33):
I think it is much more of the revolutionary shift, much more. That said, web devs have it hard. The world of web dev has always moved pretty fast and there's always been a ton of new stuff to learn and AI is no exception to that. This is definitely a new thing to learn. It is another thing to go on the list and quite a big thing, but it is definitely more revolutionary.

What we've actually done with AI is we've shifted the concept of the capabilities that we've got, as in we've completely moved the goalposts as to what we can achieve, what products and features we can offer our users. That's just opened up an entirely new world.

Alex Booker (15:13):
When I described TypeScript or SpEL or something, that's just another way to do the same thing, right? There might be developer productivity benefits, but the point you're making is that this enables a whole new category of features and applications.

Tom Chant (15:27):
Exactly right. Most people are learning TypeScript not because they specifically had a use for it themselves, but because it makes them more employable. I think with AI it's the complete opposite. You're learning AI because it gives you just so much more power.

Alex Booker (15:41):
A big misconception I think with AI engineering is that it's somehow akin to being an AI researcher or a machine learning type engineer. These are very mathematical domains and they're very much specializations that would be difficult to thrive at while also being a front-end developer. Basically those really smart people who are very well-trained and specialized, they work within research labs like OpenAI for example, enabled with a big team and a big budget and plenty of research to produce these foundation models. These are what we interact with as front-end developers.

They, OpenAI in this case, but other companies as well, they essentially chuck that foundation model over the API wall allowing us to interface said foundation model to build features into their applications. With it being so fundamental, I think it is quite important to define within our conversation. So maybe, Tom, you can tell us a bit more about what foundation models are and what they look like.

Tom Chant (16:36):
A foundation model is a large scale machine learning model and basically it's been trained on a massive dataset. What that means is that these machine learning engineers that you've just talked about in their laboratories have taken a lot of time, a lot of money, and a lot of processing power, and they've put together this model which can be adapted to a wide range of tasks.

What do you need to do that? Well, you need something like 10 terabytes of data. So we're talking about vast amounts of data here. You also need the processing power, which is expensive in itself. You're running this on loads and loads and loads of GPUs. We're talking about spending several million dollars over the course of a couple of days to process all of this data and to get it down into these parameters.

Now, if this is sounding complicated, it is really, really complicated and as web developers, we only really need to have the vaguest high level idea of what's going on here. But what we end up with is all of these parameters together and a program which has the ability to interpret these parameters and put information to them and take information from them. And that is basically what is a foundation model.

The foundation model is the model that you can build apps on top of. And the most obvious example of that is the ChatGPT interface that I'm sure pretty much everybody has seen. And that is an example of an application that's built on a foundation model. So ChatGPT is not a foundation model, but the GPT models are foundation models.

Alex Booker (18:12):
I think you make a very fair point that we as developers don't really have to understand the intricacies of how it works, just like we don't have to understand how a compiler works or how SQL server works, we just give it input via the API in that case. But since this is quite a revolutionary technology, I am curious if we have some idea about how these foundation models are created. I wonder if it's a pattern recognition type thing or if more likely there's an advanced concept here that we could at least know by name?

Tom Chant (18:42):
One thing that I find really, really intriguing here is that the scientists at OpenAI and all of the other big AI companies don't actually know how and why this works. They themselves do not fully understand it. And as soon as we get to grips with that fact, that cuts us some slack that we're not going to understand it either.

Now, what we do know is that AI models work on neural networks. A neural network is basically a computational model which is inspired by the human brain. So we've got these nodes and they are networked together in a way which allows them to learn from data. It enables them to do tasks like pattern recognition, decision-making and making predictions. That is very much like the human brain, but obviously in a much, much more basic way.

What the machine learning engineers need to do now is figure out exactly what that process is and exactly what's happening. Now, that's a difficult task because these models have evolved over a long, long time. They've had things added onto them, things taken off them, they've been tweaked. So what they're left with at the moment is something which is pretty complicated and convoluted and which they now need to untangle so they can fully understand it. And of course, by fully understanding it, they'll be able to make it better we hope.

Alex Booker (19:59):
I hope that about my own code sometimes as well.

Tom Chant (20:01):
Yes, exactly.

Alex Booker (20:04):
So it's reassuring to hear that.

A lot of the time when we think about foundation models, I'll just name a couple like GPT-3.5 and GPT-4, which are what ChatGPT use under the hood. And they're text-based, right, and the interface we use is that of a chatbot essentially?

Tom Chant (20:18):
That's true, but we do have to talk a little bit about multi-modality here.

Alex Booker (20:23):
What does that mean?

Tom Chant (20:23):
These foundation models have potentially more than one capability. You might be able to divide them up and say, "Okay, this one is a text generation model. This one is a text-to-speech model. This one is an image generation model." But what you actually see now if you use ChatGPT, is that the interface itself has integrated that altogether. So you can ask it a question to get a text answer, but you can also ask it to generate an image.

The famous image generation model from OpenAI is DALL·E, followed by DALL·E 2, DALL·E 3. Well, DALL·E actually was an offshoot of a GPT model, I believe GPT-3. So what you can see is actually these foundation models might have specific tasks, but they also have multi-modality, that is to say multiple capabilities.

Alex Booker (21:10):
Yeah, and it blows my mind. I know the text generation stuff is insane and I can't even begin to reason how it works, but somehow applying that to things like images just seems a step further. It's unreal. Apart from giving an input and getting an output, there are technologies you can install on your own computer, like ControlNet where you can even paint.

You can get a canvas and you have an image on that canvas. You can paint a blob essentially and then tell the AI to generate something where that blob is. And that's great for an interior designer, for example. They want a chair in that corner, they highlight where the chair's going to go and then they give the prompt. The fine grain control you get there is incredibly powerful, I think.

Tom Chant (21:49):
Absolutely, yeah. And that does raise a question as to whether these are tools which are going to replace humans or tools which are going to help humans. And I think you've just described a really good example of where an interior designer can just make their workflow that much quicker by using AI to stick a chair in the corner. Amazingly impressive what you can do with images and also going the other way round, passing in an image and actually getting back text. Its ability to quickly describe an image in detail is amazing.

Alex Booker (22:20):
I mentioned GPT and there are a couple of versions like 3.5, 4.5, and then each of those have Turbo variants like 4.5 Turbo. What does that mean?

Tom Chant (22:31):
As they release new models, kind of like putting versions on packages, they're going for the big upgrades like from 3 to 4, but in the middle you get a .5 and a Turbo. At the moment we're up to GPT-4 Turbo. But as soon as you Google GPT-4.5, of course you get the rumors and the speculation that it'll be coming out soon or possibly straight to GPT-5. We just don't know, but we-

Alex Booker (22:59):
Oh, 4.5 isn't out yet?

Tom Chant (23:00):

Alex Booker (23:00):
It's 4 Turbo. And even 4 Turbo is harder to come by, I feel like.

Tom Chant (23:05):
Yes, you don't automatically get access to all of the OpenAI models. It does depend on if you're a pro user and sometimes you'll have to join the wait list as well.

It's also interesting that I think oftentimes you don't see too much of a difference between using GPT-4 and GPT-4 Turbo for example, but there are things you might notice depending on your use case, which is the amount of data that you can give them. GPT-4 Turbo can just handle a ton more data.

I think the biggest possibility there allows you to upload literally a novel length of text, something like 100,000 words. That's some of the big changes you see. Whether you ask it to, I don't know, give you ideas of what to buy your granny for Christmas, you're probably not going to see a huge difference between 3.5 Turbo and 4 Turbo.

Alex Booker (23:55):
Ah, that's interesting because you would associate a version increase like 3.5 to 4 with the capabilities of the model, the quality of the output maybe. But just like any software, it gets new features and one feature it got was the ability to accept a much bigger input, which is awesome, by the way.

I was in Germany recently trying to go to a swimming pool and the whole website was in German and I know I can use Google Translate in Chrome, but I thought why not just paste the whole page? I pressed Command-A on Mac, pasted it into ChatGPT with 4.0, by the way. I didn't ask it to translate the page. I said, "Hey, tell me in English what the opening times are." And even though it had a bunch of HTML cruft in it, like text from the footer and some HTML entities that didn't really belong in there, it just gave me the answer in English and that to me was pretty sick.

And the funny thing is I actually tried pasting that into 3.5 and it wouldn't work and I think it was because the input was too long. So when I switched to 4 it worked and I got the answer I was looking for. I didn't actually realize what happened there, but now I do.

Tom Chant (24:55):
Yeah, that is likely the context length and that has been one of the big improvements. And of course one of the most important things that you can do with AI is crunch large amounts of data. So the more information you can pass it in one go, the better. Previous to that, you'd have to break everything up, pass it in, iterate over it, pass it in bit by bit, and then try and build your answer from there, but that's obviously not as good.

Alex Booker (25:19):
True. True. OpenAI is a research lab. They produce these models which are GPT-3.5 and 4 and Turbo versions. I mentioned we mostly think about text and then we started talking about images, but there's other things, right, like speech and video? Are they specialized foundation models and what is the state of them? What can we build with them, do you think?

Tom Chant (25:39):
They're pretty far advanced for sure. Video I'm not so sure about, but the speech is very, very capable and actually very, very easy to work with.

Alex Booker (25:49):
What does OpenAI Whisper do?

Tom Chant (25:51):
So for example, you can just take some text, you pass it into the API and you get back, well, you've got a couple of choices. You can stream it back as audio or you can get your MP3 file from it.

Alex Booker (26:03):
So it's like text-to-speech, but I'm guessing instead of Microsoft Sam, it sounds like a human natural speech. Is that the idea?

Tom Chant (26:12):
That's exactly it. You've actually got I think five preset choices of voice that you can choose from. You've got a couple of different accents, a couple of different speeds, a couple of different levels of energy. But what's interesting and what is showing the power of AI is that it's actually obviously got an understanding of the text that it's reading because it's putting the sentence stress in the right place. That's not just like if it's a question it will go up at the end, but it's everything to do with the emotion in the language. And that's something that we've never had before. We've had text-to-speech for a long time, but not like that.

And actually, I've tried it quite extensively and I found it flawless. Or the only improvement that I wanted to see is to have a little bit more control over the tone and just the the softness or the harshness. The five options you have are great at the moment, but I'm sure that's something that's going to change a lot. They'll probably just upgrade the API at some point, so you've got a lot more control over that. And also, of course OpenAI isn't the only way to do that. There are other AI options out there. So quite a lot more to explore.

Alex Booker (27:18):
And not only that, but you can input a different language, you can get a different language out. I suppose once you crack it with English, it's the same fundamental idea to make it work with other languages and therefore translate as well.

Tom Chant (27:30):
Yeah, every language I've tried with has been as far as I can tell, really, really good.

Alex Booker (27:35):
The other thing is that we think about how we interface with these models and often it's that we give it a text input or something like that and we get an output, whether it's another text response or in this case a speech and an MP3 file, or an image by the way, or a video potentially. Although I can imagine with video, it's a bit more computationally intensive and therefore that introduces challenges around how quickly we can iterate on these things because the feedback loop is so long maybe. That's my guess anyway.

But I think what's really, really, really exciting is when we break out of the confines of a chatbot or confines of generating a single file, there's something called ACT-1, which is called a transformer for actions. And basically the idea I think is that you can start to use the output from these foundation models to create actions on a webpage for instance, and therefore interact with a webpage. There are other avenues that I've heard of, for example, Google have a foundation model that controls a robotic arm.

I guess what I'm getting at is at that part where it goes beyond a simple API request and response, I think that's maybe emerging, but it's emerging quick and there are going to be all kinds of really interesting and creative ways for these foundation models in the way they work, affect the applications we use, and the applications we as developers build.

I'm cognizant of the fact that in this conversation and indeed many conversations, when we talk about this technology, we talk about OpenAI. Obviously they're the company behind ChatGPT, which was the most popular. Even muggles, even non-technical people, they know about ChatGPT. And at the same time, OpenAI have this really wonderful suite of APIs. They have fantastic docs, but they're not the only company doing this kind of stuff, and I think that's important to acknowledge as well.

Tom Chant (29:21):
Absolutely. OpenAI is the big dog in the game, but we've also got Anthropic's Claude, there's Bard of course. And then that brings us on to open source and Llama and Hugging Face.

Alex Booker (29:37):
Are they as good as OpenAI's GPT models, because they seem like budget versions instinctively in a biased way, I know?

Tom Chant (29:44):
Really good question. I mean, I think what we need to focus in on here is Hugging Face. Hugging Face is like GitHub for AI. So it's basically a storage, a repo of loads and loads of AI models which you can use and it's got their docs with them. Hugging face covers a myriad of possibilities. Now, are they good? Are they bad? Are they all kind of like home brew things that people have just cooked up in their bedrooms? Well, there's a big, big mixture on there.

Alex Booker (30:11):
A fair point because on GitHub you have open source technologies like Kafka and then you also have a to-do list app that I made as a hobby project, so I get that.

Tom Chant (30:20):
Absolutely. Yeah, absolutely. I'm not entirely sure what you need to do to get on Hugging Face and what the minimum requirements are, but what you will find on Hugging Face is that things are open source. And that's really important because OpenAI has got the word open in the name, but it's not actually open in the open source sense of the world at all. Hugging Face has got all of these open source models.

Now, recently, I was setting some challenges to some students and I wanted to check if they could do them with Hugging Face as well as with OpenAI. And the reason for that is because Hugging Face have got, or at least the Hugging Face inference model has got a very generous free tier. You're actually allowed some free API calls every hour. So even if you do run out of those free calls, you don't have to wait very long until your next ones come in. And that's really, really great for students.

But what I found was when I was comparing solving challenges with Hugging Face models and solving them with OpenAI is that OpenAI was much easier to work with and much easier to get good results with. Now, the area where they were actually the most similar interestingly enough, was image generation. Hugging Face models produce great images just like OpenAI.

Where they were at their worst was actually in the more straightforward text generation. OpenAI is excellent at text generation. The Hugging Face models I worked with, you needed to push them a bit harder. You needed to think a lot more about your prompt and what you're asking them to do to get the most from them.

So there is definitely more work involved and it is more like you're dealing with a product which has not necessarily quite reached maturity yet, but a lot of potential there. You certainly don't always need OpenAI and you might well be able to find something cheaper.

And I think going forward, the cost of AI, when you're scaling apps is going to be quite a big issue to deal with. So looking at open source, much cheaper models, looking at self-hosted models, these are all things which are going to be really, really important for the AI engineer going forward.

Alex Booker (32:20):
Well, let's be frank, you mentioned before that to build these foundation models, it takes a huge amount of resources, like genuinely millions of dollars in computing power. It's hard to fathom. So of course OpenAI wants to charge you a little bit of money to use their API and get access to this. That could get expensive if you're building an application at scale.

And there are other considerations, I guess. It was only a week ago at the time of this recording that OpenAI went down for quite a lot of people, maybe including the API. I don't want to make a false claim there.

Tom Chant (32:50):
No, there's a lot actually of downtime. I mean pretty much like any other service, but it does beg the question, what is my AI based app going to work on when OpenAI goes down? And I think I saw a statistic the other day, which is something like there's been some kind of outage at OpenAI on about 23% of the last 100 days or something like that.

Alex Booker (33:09):
Yeah, that's unacceptable. It's true, 100% uptime is not a thing, but 99.999% of uptime is a thing. A five-9 SLA is available for certain infrastructure providers and certain developer tool APIs. You don't want AWS to go offline. So that's an interesting thing to reflect on that you can take more control of with an open source model.

Tom Chant (33:31):
Yeah. You'll wonder if they'll take the route of going all out into business and wanting to do all of that themselves or more being a provider that's then going to just pass that on to someone that's going to do all of that for them. I guess we'll see how it goes.

But I think the open source models do give you the option to have just a backup in place so that if OpenAI goes down, maybe you've got something which is not quite as good, but hey, it's going to work and it's going to keep you covered for a couple of hours.

Alex Booker (33:58):
And maybe, I don't know, I've not played with it to the same extent as you, but this idea of it not being so good. The way you explained it is that you have to be more considered with your input. I've put some really lazy stuff into ChatGPT and got really comprehensive answers. But then again, maybe Chat GPT's benefit in having so many consumers is that that gives them a data point to figure out with this input, what does the person likely want? Maybe that's a thing they have that other companies don't. But it's not to say that the other models can't produce the same quality input. It's just maybe you need to be a bit more pointed with your input.

And that isn't necessarily such a big downside because it might force you to think more carefully about what you want and therefore, get a higher quality response in general instead of the good enough for most people response you're going to get with a vague input to ChatGPT.

Tom Chant (34:48):
Absolutely, yeah. I think it is just about honing your prompt engineering skills and understanding the API that you're working with and getting the most from it. Maybe before I sounded a little bit negative about the Hugging Face models, and I don't mean to. They are pretty amazing in their own right and they can do an awful lot. There's a lot of potential there.

And then there's also the potential when it's an open source model to take it offline. You can actually run a model itself on your MacBook. I think you need something like a gig and a half of free space and you can actually run a basic Llama model on your laptop. So you do wonder if in a way we're going to come away from the cloud and start having people self-hosting their models because it would be much cheaper. If you can get the model to do what you need it to do, well, then it's all yours.

Alex Booker (35:35):
Yeah, absolutely. It's interesting because it takes an enormous amount of computing power to build these models. The models are just a file. They use, I forget the number you quoted, but terabytes and terabytes of data to train the model, but then once you have that model, it takes much less space. It takes much less computing power to produce an output, but still a little bit.

And that's why I think with ChatGPT, how it sends you one word at a time, I think that is actually representative of how fast it's going in a way. And probably, if you were to do it on your local computer, you'd see either a similar result or you'd see a similar result, but it's much slower because maybe your local hardware is not as powerful.

It would depend on a few parameters, obviously like the model you're using and the query and that kind of thing. But it is interesting that it does take a bit of juice still to get to that point, just not as much, definitely much less than creating the model.

Tom Chant (36:24):
Absolutely, yeah. And the 10 terabytes I quoted earlier was actually for quite a basic model. But I think if you're running the model yourself, then obviously you're going to have computational overheads, and I've never actually tried hosting a model on a MacBook. Maybe I should.

Alex Booker (36:38):
I really want to try that because Llama is the obvious path to go, I think when doing something like that. So that'd be a really fun weekend project.

Tom Chant (36:46):
And what we want to achieve is running it in a scrim. That would be perfect.

Alex Booker (36:49):
Hell yes, absolutely. By the way, there's one thing you said in your course which I really loved, which is that if you give something like ChatGPT, which as we know is the interface to the foundation model, but if we give ChatGPT the same input twice, we'll get two different outputs. It's not deterministic, just like a human being.

For example, if I write a draft of a post and then I close the tab by mistake, I have to start from scratch. The second version is going to look totally different even though I'm the same person with the same brain creating it. And I just thought that was a really powerful example because it talks to what you said before about the neural network and how it mimics a brain a little bit. I think that's fascinating.

But I do realize that everybody gets the same foundation model, right? It's like when that foundation model gets chucked over the API wall, we're all interacting with the same model that was trained on the same data. And that model isn't really aware of any of your private data or maybe if you're a small company or you have a niche type of product, it might not be aware of that either, just because it didn't get scraped and trained on originally. I've heard of this idea called fine-tuning, which by name would suggest that you fine-tune the model, I guess based on your own data.

Tom Chant (38:01):
Fine-tuning isn't quite used for that purpose actually. We need to think about this as in the AI is doing two things. It's giving us some content and it's also giving us that content in a certain format or style. What fine-tuning does is it allows us to take this model and to tune it to produce in a more specific style or a more specific format.

Let's say that you had your own special way of writing blogs and you've got your own little quips and your little jokes that you always use.

Alex Booker (38:32):
We use loads of emojis at Scrimba, for example.

Tom Chant (38:34):
Right, yeah, something like that. If you fine-tuned it on a bunch of these kinds of blog posts or whatever they are, what you could then do with that fine-tuned model is drop in your notes and have it produce blog posts in that style with the emojis that you want in their right place. So fine-tuning is not so much about the content, but about the style and the format.

You can also use that if you needed stuff to be in a particularly complex data structure, for example. You could train it on data structures and have it produce in those data structures. But I think what you were referring to before was RAG.

Alex Booker (39:12):
Okay, what's RAG?

Tom Chant (39:13):
Rag is retrieval augmented generation. Going back to your example, you've got this company, this company has got its specifics, for example, its opening times, delivery policies. Now in its training data, the large language model may or may not have been exposed to that company's website, but the fact is it doesn't know the specifics about that company. It knows the specifics about loads of companies.

So if you ask it what time does X, Y, Z company close, the chances are it can at best give you an answer that's a year out of date, and at worst, it can give you an answer, which it has hallucinated, i.e., an answer, which it has just imagined as being plausible. Oh, we're open from 9:00 till 5:00.

Alex Booker (39:54):
It does say I don't know sometimes, right?

Tom Chant (39:55):
As it progresses, yeah. As it progresses, it's become better and better at saying I don't know, I'm sorry, I don't have the answer to that question, please check the website, or something along those lines. And also nowadays, ChatGPT of course has got the ability to plug into the internet and actually Google it for you effectively.

But that said, what you can do with RAG is you can take your company's knowledge, you can take its data, you can take its support tickets, you can take basically any data you want, and you can give that to the model as specific knowledge. So then the model is able to answer questions specifically on your company's data.

That's really, really important because if you want to properly have customers interacting with your company, you want them to be getting the correct information, not some weird hallucination that ChatGPT has found many years ago. RAG enables you to do that, so then you can actually have a chatbot which can answer questions specific to your company. It's a really, really powerful thing to do, and it's basically a set of tools which you can use to build an application which gives the AI this specific knowledge base.

Alex Booker (41:03):
Here's a question for you. We talk about companies, maybe they have a knowledge base or they have a history of support tickets they can use to build an intelligent chat interface for their customers. But I was thinking, well, what are examples that aren't companies? And I thought to myself, well, maybe I could input all of my Google Docs, all of my Notion documents. Or maybe to bring that one step closer to a feature, I could have a journal and I could upload all my journal entries, and then the model can then answer questions about me for self-reflection purposes or maybe just memory purposes, like what happened on this date.

And then I started wondering, is that significant enough to warrant RAG or could I just make it one really huge input into ChatGPT? We spoke about GPT-4 and the fact it can take a huge input. I can just copy and paste my entire diary in there.

Don't want to confuse the question too much, but another thing we know about is a system prompt where you can go into ChatGPT's interface and by extension, this will be a parameter in the API and you can give it a prompt to apply to every subsequent prompt. You can tell it to talk in a tone. So as you can see, I'm clearly a bit confused. I was hoping you could shed some light on the subject.

Tom Chant (42:13):
The short answer is it depends, for example, how long your journal is, but the chances are now, yes, you could, you could. Your journal is probably less than 100,000 words or whatever the biggest context window you've got now.

Alex Booker (42:24):
Probably, I'm very introspective.

Tom Chant (42:27):
But you have got a big problem there, which is that every piece of data that you upload to OpenAI is costing you money. It's all broken down into these tokens, and a token is approximately 75% of a word. So what you're doing every time you upload your diary because you want to ask just one thing, like what was I doing on this day? Or what did I give him for his birthday last year? Or what did he give me more to the point? Every time you're doing that, you are costing yourself a whole ton of data.

What RAG allows you to do is to take that knowledge away from the model, so it's just in a database and you can query the database for free. But where AI comes in is the way that it's actually worked with that data to make it into that format that you can put into this database that you can use in this particular way. But also, you're taking over the conversational powers of the AI, so the data that gets back can actually speak to you like a human being and converse with you and understand your logic and-

Alex Booker (43:24):
Okay. Okay.

Tom Chant (43:25):
... all of the good stuff. I suppose there's also another option as to how much of your personal life you want to upload to OpenAI. I mean, they have got their terms and conditions and it should be safe, but AI in general-

Alex Booker (43:37):
I'm very introspective.

Tom Chant (43:38):
Yeah, you could be giving away all sorts. So who knows? But RAG allows you to take that much more into your control.

Alex Booker (43:46):
That's very interesting, fascinating in fact. The only thing I regret is that as I learn more about it, I realize that this is going to be an interview all in itself. Luckily, next week I'm joined by Guil Hernandez. He's created a course on embeddings, which are a very important part of RAG. So when I speak with him next week, we're going to go in depth on all of this.

So I can learn alongside you listening because clearly I don't know everything about this subject, far from it, and we can learn a bit more about how to bring these ideas closer to the applications we're building, the kind of applications that will hopefully really stand out in a developer portfolio if you're looking for a new or your first developer opportunity as well.

Tom, I've learned so much today. Genuinely, I've been smiling the whole interview. It's been super fun to learn about some of these things. There are so many nuanced terms like fine-tuning versus RAG, or 3.5 versus 4.5, or how Turbo plays into that. This conversation has just been little insights everywhere, and we've spoken about some big ideas as well. So I really appreciate you for that.

Something that's really important to me on the Scrimba podcast is that we always bring this back to actionable ideas. I think we've spoken about a lot of specific actionable ideas here today, but this is the kind of subject I think you need to sit down and study a little bit and practice, and that's what Scrimba's AI Engineer Path is all about. Maybe you can tell us a little bit more about it.

Tom Chant (45:02):
Yeah, absolutely. So at the moment, and I say at the moment, because this is such a changing field and we know we're going to be updating this pathway all the time, but at the moment, the course is broken down into four different modules really, arguably five.

So we start off looking at the OpenAI API, and we're taking things there from the very basics. Also, in the first module, we're looking a little bit at some of the open source models from Hugging Face and a little bit about the state of AI and some of the dangers and security implications of it. Moving on, in the next module, this is Guil's module, and this is where we talk about embeddings and we get more into RAG. That's a really, really interesting module because then you're really, really seeing some of the most important uses of AI in the real world.

And that carries on with Bob's module, which is all about agents. Agents is a whole nother topic. I think it's going to blow your mind, but it's probably where AI is heading in the future. It's basically about giving AI actual power. It's actually about giving AI the ability to make choices and call other functions and bring in outside resources and tools to make decisions. Really, really interesting.

Alex Booker (46:15):

Tom Chant (46:15):
And then we finish off with a module on LangChain. And LangChain is an AI-first framework, which actually gives you a set of tools, which allows you to do things like RAG much more easily and allows you to spin up apps much more quickly. So really, really big name in AI at the moment.

Alex Booker (46:34):
Tom, thank you so much. It's been an absolute pleasure.

Tom Chant (46:36):
Thank you for having me. It's always great to come on.

Jan Arsenovic (46:39):
But before we go, let's take a look at your social media posts about our show. 'Tis the season for Spotify Wrapped. Johanna Bayer tweeted, "The Scrimba podcast, I enjoyed every one of the 2,937 minutes I spent with you in 2023. Thank you." No, thank you for listening.

Aishwarya tweeted, "Spotify also agrees that the Scrimba podcast is my favorite. I have listened to 1,007 minutes."

And over on LinkedIn, Jose Carlos Rivera said, "In my Spotify Wrapped, the Scrimba podcast was the most listened to. I'm very grateful for this podcast as it has been very helpful for me to improve my productivity. I used it as a strategy. I would listen to it at the gym during my PM workout. Before, I would only go home to sleep, but when I listened to it at the gym, it motivated me to code after my workout. I think this is a strategy that can work for many people. If you're struggling to motivate yourself to code, try listening to this podcast at the gym." That sounds like a great strategy. I hope more people try this.

If you enjoy our show and you want to make sure we make more of it, the best way to support us is to post about it on social media. Word of mouth is really valuable. And as long as your posts contain the words Scrimba and podcast, we will find them, and you might get a shout-out on the show itself.

If you're feeling super supportive, you can also rate and review us in your podcast app of choice, think Apple Podcast reviews or ratings on Spotify, or basically whatever app you use. If it lets you rate and review what you're listening to, please do so, it really, really helps.

That's it for this episode. The show is hosted by Alex Booker. I've been Jan, the producer. Keep coding and we will see you in the next one. And I mean, literally see you because currently we're also posting these on YouTube. Bye.

Demystifying AI: What Are Foundation Models (and How to Use Them), with Tom Chant
Broadcast by