Logan Kilpatrick on Google’s new Gemini Models, AI Agents, AGI, and more Artwork

Rowan's Notes

Rowan's Notes is a podcast where The Rundown's founder, Rowan Cheung, interviews the people shaping the AI industry — breaking down what’s real vs hype and how to leverage it to get ahead in your life, work, and business.

All Episodes

Rowan's Notes

Logan Kilpatrick on Google’s new Gemini Models, AI Agents, AGI, and more

September 24, 2024 • The Rundown AI

Google just released two upgraded Gemini 1.5 models 1.5-Pro-002 and 1.5-Flash-002 — achieving new, state-of-the-art performance across math benchmarks.

Rowan Cheung (https://x.com/rowancheung) sat down with Logan Kilpatrick (https://x.com/OfficialLoganK), product lead for Google AI Studio, to discuss what makes these models so unique, AI agents, AGI, and more.

Listen to the full interview on YouTube: https://www.youtube.com/watch?v=WQvMdmk8IkM

Try the new models on Google AI Studio: https://shortclick.link/djojib

__

Join our daily AI newsletter: https://www.therundown.ai/subscribe
Learn AI hands-on with our AI University: https://rundown.ai/ai-university/

0:00

This is such an exciting moment. Today we're rolling out two new production ready Gemini models. I think what's. Really, really cool with the Age of AI Is. Seeing anyone. Even people who are not. Technical being able to build their own AI apps. I really. Can go and tackle. More difficult problems now. Because I have. AI as a as a copilot. For the person who's never coded before, they're now able to tackle. Any problem with code because. They have this copilot in their hands. The most successful individual contributors. Creators. Managers and. Designers. The people who are most successful are the people who are going to be building with AI tools. There's so many cool products to be built You have the idea as you're coming up with use cases for. What's the new stuff that's being unlocked by the latest iteration of Gemini models? Like, even for me in the last couple of weeks, and it's like, well, there's a lot of really interesting companies that you could go and build their products to be built. We're so early. In reality. If you're keeping up with things on. Reading newsletters, keeping up with YouTube videos, watching podcasts like this. You're probably already in the. 1% of early adopters. All right. Thanks so much for joining me today, Logan. Let's get right into it. So there's a bunch of new announcements by Google today. Give us a rundown of everything announced and why it actually matters. Yeah. This is such an exciting moment. I think we've been getting developer feedback for the last, you know, five months about about all the Gemini models and how to use them. And today we're rolling out two new production ready Gemini models and also improving a bunch of the things that, like, have been, you know, the sources of feedback from developers around rate limits, around pricing for 1.5 Pro, around some of the filter settings enabled by default. And really, all these are focused on enabling developers to go in and build more of the stuff that they're excited about. So these are sort of the follow up, culmination of all of those experimental models that we've released over the last few months. And again, I keep getting, pings and DMs from people telling me how much they've been waiting for these models to actually roll out so that they can really start building with them. So, that is the that is the rundown. As a result. Add. So I want to say is what's really cool, and I don't know if the pace has actually changed since you have joined Google for at least an X, it's you've done an amazing job in shedding light on all the innovation happening and all these announcements coming out every single day. My follow up question is kind of regarding the announcement is what exactly makes the new model so unique? Yeah, it's a good question. I think it's it's maybe less so of of what makes it unique and more so just like the, the general trajectory of the trend that we're on, which is like the models coming out of and, you know, part of the one of the best parts of my job is getting to work with the folks at Google DeepMind. And I think, like I have such conviction in that team and the direction that they're heading in. So I think it's it's more so the at least the way that I've been thinking about this from a, from a developer perspective, is the sort of path the linear amount of progress that we've seen with, and in some cases exponential and in different benchmarks with, with this iteration of job and AI models, even from IO, where, where I saw you last, on the Google campus has been has been incredibly exciting. And it's again, it's a bunch of the things that, like developers have given us the feedback around, like, you know, the model's not punting on as many questions. They want the model to just sort of respond to the questions that they asked and not sort of, you know, try to try to get out of some of the questions. And there's been a whole bunch of improvements on things like that math, the ability for the models to code, which is obviously super important for people who who care about developer stuff. So it's been it's been a lot of listening, and, and sort of iterating on the feedback that we've been, we've been getting from the ecosystem. See? Briefly said bath. Can we talk more about that? What are kind of, the new improvements to math and possibly reasoning for these models? Yeah, this is one of the most tricky things. And we were sort of even talking internally about like, how do we expose some of the, some of the nuance of what makes these models better at math? To sort of a general audience. And I think it is somewhat of an open problem, like at least the examples that I've seen as like math that's much beyond my capability. Who's someone who's taken like, you know, three calculus classes. So there's a lot of like in general, like the model is just like better at doing math problems, which I think, like generalizes to a bunch of domains in which you sort of have to, you know, think more deeply about about the problem space that you're trying to solve. But the actual like, examples of like, you know, the model being better at actually doing math is like, not super practical of a use case, because you probably want some system like we have with code interpreter that can actually just write the code and then give you like a deterministic output, more so than you would want the model to just like take a raw shot at trying to solve a math problem that you give it. But again, I think to your point, it's like really the the way in which math problems are solved and that sort of step by step iterative process, which is I think the, the sort of exciting takeaway from, from the math improvements. Yeah, I think the key here, with all the math and reasoning improvements is the progress. And can you kind of explain me again to the non-technical audience and why this progress actually matters and what it might mean a year from now? Yeah, I think a lot of the AI applications that are built today are like generally work well. When you go to do that sort of demo use case and you I think the time from first building something or first trying something to sort of having that magic, wow, this is incredible. Momentous is actually probably the shortest of most of the recent technology trends that we've seen. The challenging part is like getting in from that demo moment to something that you would like, reasonably want to put in front of your customers on, like a very large scale is like actually a really long process. So all of these, the sort of the progress is, is directly aligned with making so that more people can actually put this stuff into their products. Building cool demos is awesome, and I love cool demos more than anyone. But actually, the way that the technology has valuable for, for developers, for startups, for end users is by being robust and reliable. And I think that's the sort of, general, general trajectory that we're on right now. Yeah, an odd trajectory. Could you kind of maybe talk more about what the importances of getting all the new versions of Gemini into the hands of developers quickly. Yeah. This is to your point about whether or not we're we're shipping faster or not. I don't know if I, if I can, I don't know if I have the perspective because as long as I've been at Google, we've been shipping stuff quickly. So I'm sure there was a lot of people who are who are moving super fast before. I think the, the developer angle of this is something that was an immediate thing that stood out to me when I joined Google, and when I was chatting with folks on the DeepMind side, like they were like, yeah, how do we make these models better for developers? We care. So much about this. You know, these are the ultimate end users of folks who are, leveraging our models. What can we what can we take action on? So I think all of these experimental models that we've done and, you know, testing them on, on SES, getting them in the hands of developers, putting them in a studio. We've actually seen a bunch of very, very interesting trends about like, what models people like, like, for example, the, the some of the experimental models are like the most used models on, on AI studios. And people are like really showing up and sort of intentionally trying to search out those models, even though it's maybe not the easiest thing in the world to do. Because they've sort of heard and are actually seeing the improvements. I see, like we just launched the side by side mode in a studio. And like, people are putting the models side by side. You can actually genuinely see that, like, the model is better at solving this, you know, a whole host of new problems. Which, which at the end of the day is, is what developers like to see, that sort of progress, that continual movement towards the direction is just really easy to get behind. And it's also what gets me what gets me excited as AI. Yeah. As all the the long, the long hours, the late nights or early mornings. For a ton of people at Google to make all this stuff possible. Yeah. It's, the pace recently again has been incredible to witness. It's been amazing. Can you share an example or some maybe impressive use cases, or how customers or users are using these experimental models of Gemini in the real world? Yeah, that's a good question. One of the challenges, and this is why it's been so important to get these models out into the, into the hands of more developers, like through this, like, production ready version. Is that, the rate limits on the experimental models are so low that, like, you can't even use them for, for anything. It's like a couple of requests per minute or something like that. So developers actually haven't actually been able to even, you know, in some cases, assess the full impact of, of how important these new models are going to be. I think very directionally, it's like, you know, you can swap out a couple of request basis and see like, oh, this model's actually generally better at specifically a bunch of the vision stuff is is what I'm most excited about. If you look at sort of the core premise for when Gemini was originally created, less than a year ago, which is still crazy to think about. The first model came out in December of last year. The the intent was build a multi-modal model from the ground up, like it wasn't a text model that we just bolted on the ability for it to understand images. And really, I think that's because the order of magnitude of important use cases for the world and for developers and for people who want to build this technology, so many of them are multimodal, and it continues to be like one of the things that our models are most differentiated on, is the ability to understand images and do bounding boxes and, really have this sort of, you know, taken video, I think is one of the cool things that is like video is just so hard to, like, grok all of the details and being able to go into AI studio and just I do this all the time with my own videos is like, just drop an hour long video on there and ask a bunch of questions like, that's so, it's such a mind blowing experience to just see it work. And also like, be able to try it for free. It's. Yeah, it's I love it. Yeah. And I guess kind of looking forward, what are some of those interesting unsolved real world problems that you're most excited for, like these breakthroughs in math and reasoning and new updates to Gemini. To solve. I still think we're so early in this, a genetic workflow world. And, I've had so many conversations with, like, companies that are trying to solve the aging problem, people who are actually trying to, like, build agents themselves. And there's there's so many rough edges still. But it really does feel like we're probably a couple cranks of the model away from a lot of those use cases, just like working out of the box. And I think, like directionally, this, this latest release, like, is going to be better for folks who are doing those things, especially when you start to think about what are a lot of the failure cases for for agents today is is actually like, you know, the model is trying to go and look at screens and understand, like, you know, how do I move around and click on buttons and stuff like that? And very directly like, you know, the models being great at vision is ultimately the unlock for like how we, you and me and everyone else like, interact with, with a lot of the world. And I'm, I'm super excited to see, you know, those, vision use cases continue to work and like, also, you know, all the long context improvements, the there's so much so. And part of this is just like our, you know, the the challenge for long contacts with Gemini is it actually goes against a lot of the conventional wisdom for developers, which is like don't put tokens in the context window. And then we're coming along and say, hey, put a bunch of tokens in the context window, because you can actually do all these really, really interesting things. And I think we're slowly getting around the adoption curve of, of getting people, getting their head wrapped around what's possible with, with long context. And I still don't think, just like talking to people who are actually using long contacts in production, like we're still barely scratching the surface of of what that's going to unlock. And today's release, and actually, there was a paper that came out from the DeepMind team a couple, a couple days ago. Now, about some of the long context improvements. So there's a lot of, a lot of cool stuff that we're pushing on there as well. Yeah, I agree, I think. even Yeah, for developers, but for consumers and businesses, we're just scratching the surface of what's possible with today's models. And this is even talking about like, the future updates and all these updates you're making to Gemini. Like you said, we're just scratching the surface of what's truly possible. Like not just context windows, but for LLMs in general. Let's talk specifically about why developers should build with Gemini 1.5. So addition to the new updates, you know, higher rate limits, expanded feature access, and obviously the famously high context windows I was talking about. But what other capabilities or features does Gemini 1.5 offer that developers should be really excited about? it's, it is long context. It is multimodal, but also like a bunch of, like features, specific things like we've had, you know, we're the we were the first to ship context caching. Which sort of eliminates a lot of the, the sort of financial burden for developers who are, who are building stuff in production, especially with longer context prompts and examples and data and stuff like that, where you could just sort of pay a flat fee to store your, your tokens on an hour by hour basis, and then all the incremental tokens that you pass in on top of that, it's like, a significantly reduced rate. I think that that continues to get me excited. I think fine tuning is another piece of this. Like one of the big differentiators is you can come to Gemini, you can come to my studio, fine tune, Gemini 1.5, flash for free, and then ultimately put that model into production and pay the same sort of, extremely competitive, per million token cost, that we have. Like, there's no incremental cost to use a fine tuned model which is super differentiated and ecosystem like no one else is. As providing now and again, all this is angled towards how do we give developers more freedom to build and less, you know, burden to, to build with AI technology. And part of my perspective is the financial burden to build with AI is like one of the great limiters of this technology being accessible and and ultimately like providing the most amount of value it can be to the world. And we, you know, our strategy to combat this is like we have the most generous free tier of any language model that exists in the world, specifically because we think there's all this cool stuff that can be built, that hasn't been built yet, mostly because developers have had to show up and put a put a credit card in or pay some amount of money to like, get started building. And with AI studio, you can literally just show up, sign in with your with your Gmail account or your Google account or whatever it is, and then, you know, get an API key and start building, which, I as a developer loved, to see us doing and pushing in that direction. Yeah. So it's like in some it's cheaper than ever to build with Gemini and is the cheapest platform to build on. Yeah. And I think like it's beyond it just being the cheapest. It's also, the sort of trade offs specifically with like Flash as an example, like per token, or like token per intelligence unit or whatever. The metric of like that takes into a cost capabilities and, and cost is like the best of any model that exists today. And I think with, with our, with the price decrease and 1.5 Pro actually the same as is true for 1.5 Pro two now, and that sort of higher intelligence class and models or like per token intelligence unit basis or whatever, I don't know what we need to come up with a scientific, term for whatever this is. I think pro is just like the best value as a developer to build, with, all things considered. Which which I love. And I do think as people actually ramp up that maturity curve of, you know, going from prototype to production, I think those things actually start to matter. And the feedback that I hear from, from developers all the time is, you know, it's still so cost prohibitive. It's still and like the models are like, generally really good at a bunch of use cases, like most models are generally really good at a bunch of use cases. And we're looking for the thing that, you know, ultimately isn't going to create a financial burden for our team or our company or the startup or whatever it is, which I love. Yeah. I think what's really, really cool with the Age of AI I mean, with the amazing work you guys are doing, is seeing anyone, like even people who are not technical, being able to build their own AI apps. And now with Gemini, it's even cheaper. So yeah, it's it's really cool seeing all that's happening. If someone were to start from zero, like complete beginner, non-technical person, maybe even a student. Is there like a tool stack, documentation courses, video, or maybe tutorials from Google that you would recommend? Yeah, that's a great question. So I google that dev is are like default landing page that also links out to the Gemini API documentation. There's a ton of stuff out there on GitHub. We have a quickstart repo where you can literally run like four commands and basically have like a local version of AI studio and Gemini running on your computer and play around with the models and upload images and have that sort of experience. It's like, you know, a couple hundred lines of code. I think the beautiful thing about, to your point about, you know, the technology becoming more accessible to a wider audience of not only is there a bunch of like, low no code tools that are making it easier for people to build, but also for people who are actually just coding, you know, the models continually getting better at understanding code and writing code, you know, continues to bring the barrier down. I feel this way as, as, you know, someone who is formerly a software engineer where like, I really can go and tackle, you know, ten x more difficult problems now, because I have you know, I as a, as a copilot, I think there's, you know, for the person who's never coded before, they're now able to tackle, like, any problem with code because they have this copilot in their hands and, that's also why, like code, the ability for the models to be good at code, I think is, like, critically important because even for some of those, you know, future low no code use cases, the model is probably going to be generating that code on the fly behind the scenes. So there is that code still has to be written somewhere. It's just a question of like how abstracted away is it from the user? So code quality is, you know, a North Star. I mean, it's amazing. You got to do it again. Kind of. Last question on the sort of advice segment of this conversation. Do you have any advice for the next generation of students or businesses on not only to futureproof themselves, but how to thrive in this new age of AI on the internet? The sort of saying that has been said many times that has always resonated with me is, you know, people are really worried about I, you know, taking a job in the future or replacing a human or that type of thing. And I think, like the very practical reality is like, and this is the trend that's playing out today too, which is AI is a tool to be put in the hands of humans. And I think like the most successful individual contributors, creators, you know, managers and the future designers, all the like all of those roles, the people who are most successful are the people who are going to be building with AI tools and using them to get the most leverage possible. I also think, like the beautiful thing for, for students is that you, you sort of have a little bit more freedom in a lot of cases to, to use these tools. And I felt this way, you know, you work in a company or you work for someone. Sometimes there's like limitations on like technically what tools you're able to use and whatever your company is, your organization. I think the cool thing for students is like, go and use all the tools, like really get an understanding of what's out there. And like you're not limited by any of those like constraints yet. And and then ultimately when you show up and you, you have your first job or your internship, like you have a really great purview and you can sort of be that person to to teach all the folks that you work with about, you know, what's the cutting edge thing? Because it is. And to the point of like, why I think you do the work that you do and provide so much value to an ecosystem, it's like it's hard to keep up with all the things that are happening. And, if you really have some experience and refined perspective from using these tools and building with AI, like it's so much value to be shared with, with other people. So, go and share that value. That's always a godsend. And I appreciate the kind words. I agree, I think, in the age of AI, we're so early. We're just so early, and there's so many people. A shockingly amount of people that don't really know what's happening or are really keeping up. So again, if if people are keeping up with things, they're just so much further ahead. I always myself. I'm like addicted to X or Twitter, and it's so easy to caught up in that bubble of like, everyone knows everything, but in reality, like if you're keeping up with things on X, if you're keeping up with things, you know, reading, reading newsletters, keeping up with YouTube videos, watching podcasts like this, you're probably already in the 1% of early adopters were just so early. So yeah, it's it's an amazing time and an amazing time to learn, like, it's so much easier than ever to learn to, like, just follow your curiosity and see where it takes you. The learning angle really quick. Like I think Google also just released last week. This isn't something that's a part of the Gemini API, but this audio overviews experience which I saw you you posting about, I think, through this thread of learning like people were incredibly excited to be able to just throw a bunch of documents and data into, into notebook lab and have this, like, really lifelike podcast conversation generated. And I've seen some, like, really cool examples of people. The one that I put together was potatoes, which I care zero about and was not an interesting topic. And then it like brought the whole conversation to life. And I spent 11 minutes listening the longest time of my life listening to like, the Origin and History and, biology of, of potatoes. Yeah, audio overviews was very impressive when I tried it. And that's a that's an amazing point on like learning, right? A lot of people are auditory learners and having auditory audio, overviews there is like, hey, you can drop anything in here and it'll turn into an engaging conversation you just listen to. If people are listening to this, they don't really know the example that I shared. I basically took the entire newsletter and we don't have a newsletter on or sorry, we don't have a podcast on our newsletter, but I took the entire newsletter for the day and I dropped it in there and it didn't just like read it out like auto overviews literally create a two way conversation from that, which is really cool. That's we weren't going to listen to that. And just like having an audio recording of a, you know, my newsletter. And the core, the core thing that's powering that experience, like is Gemini and long context like the thing that actually makes it interesting is you stick a bunch of, you know, disparate data sources, you know, ten PDFs, word document. You know, I don't know if they actually accept video or not, but like, you take a bunch of that data, you dump it in there and like, it uses the Gemini behind the scenes. And I think this is like actually probably one of the most successful long contexts, use cases that we've seen, like really gain adoption. I don't know, I think we're as humans intrinsically used to now like interfaces, just like let us drag around big blotches of data and like we don't really think twice about it. But for to actually like, abstract the nuance of the data that's in there and bring it to life is, I'm actually hopeful from like a developer perspective. We'll kind of show people what's possible with actually making use of long context, and all of these, like, new paradigms. Yeah, again, to that point of just big so early. Like the capabilities of these things are so amazing, but we just take time to really explore them all. And also to your point of like know your example. My example being potatoes for what people are really already doing with this, can you kind of share some of those use cases that you're seeing that are like kind of mind blowing for audio overviews? And also, can you, can you just like, say, where people can access it? Because I think a lot of people listening are going to want to try it. Yeah, I think it's just notebook. Google.com is the, is the website. And, I have not heard a more boring example than potato. So imagine anything more interesting than potatoes. I've seen people putting like I papers and you know, you mentioned your, your newsletter that a lot of like actually probably engage in content that's much more interesting than, than potatoes. But potatoes for me was like a great baseline where, like, you can basically make anything interesting with us. And I saw a bunch of people talking about, you know, their kids and, you know, the education inbox back to the, the thread from before. So there's a lot of yeah, it's going to be the future is bright. I think honestly like to the point of this technology impacting people in a, in a positive way. I think like if the only thing that it does is, you know, make it easier to learn a bunch of stuff and bring a bunch of, like, not super engaging content, to life like that. I think that'll be a huge impact for, for AI, independent of all the other future stuff that's going to come and I feel like part of the challenge is we're always so future looking of the next iteration, that it's difficult to, like, actually grok and take advantage of, like the things that are there already today and all the value that could be created. I completely agree. I think, Yeah. Education is one of those spaces right now where AI is really, really putting impact on. And yeah, it's just not being explored enough. So yeah, I love I love these little features like, audio overviews, which is yeah, is just amazing for, for so many people. That's like what, agents. So with the rapid progress and obviously with the updates to Gemini, you said, you know, we're going to get to these genetic capabilities soon. And it's obviously kind of like the next stage for AI. We know it's coming. Can you kind of explain your definition for an agent? Because I think sometimes the terms a little unclear. So you just define that first and then we'll talk about it. Yeah, I think the one of the challenges of today's age is unlike definitions like artificial intelligence or machine learning, where there's like a Merriam-Webster dictionary definition, I feel like agents has this like, you know, Zeit Geist, like cultural ecosystem moment where, like, everyone actually defines it differently. I think for me, when I see a generic workflows, it's like really a system that's able to take action on your behalf. And like actually do something for you and today and in the world of Gemini, like, you know, developers have to do a lot of the heavy lifting themselves. You might use Gemini and, and, and then a gen tech workflow. Or use some other framework that helps you sort of build the plumbing to do agents. But, we don't have any of that, at least in the Gemini API ourselves. So it's really like you know, we put the tool in the hands of the model in the hands of developers, and then they go out and build all the stuff. And I think part of this is intentional because it's, you know, there's 1001 agent frameworks. There's 1001 companies building agents. And it's like actually not super clear, like even even to me, as someone who spends too much of my waking moments, you know, next to your boy of, you know, seeing all the demos and playing around with all the products, like, it still feels like we're really early in what that paradigm might be. So I think for, for us, historically, like the highest leverage thing that we could do is, you know, put a great model in the hands of developers to power a bunch of these use cases, what the ecosystem develop a little bit more and, hopefully, you know, figure out where the gaps are, where where we can add value from a developer perspective in the future, but not wanting to sort of, put something from our, our perspective out too early that, isn't going to end up resonating when developers. Yeah, that makes a lot of sense that I think. We're kind of seeing that now. Like you said, with all these frameworks, is there not that capable? Right? But they're starting to get there. You can start to see these things happening. And with all these updated models, it's clearly getting closer. But I guess that's that's my follow up question is how close do you think we really are to the chatbot moment for these agent systems? And do you think we'll see a massive consumer boom for them similar to what we saw in 2022? I think part of what has made this current moment of I, have the, the wow moment is actually like how easy it is for the for you to get value without putting in a lot of effort. And I think a lot of the challenge for a gigantic systems and products is actually that like for me to get value out of a, you know, agent that reads my calendar, my email, like there is some amount of like incremental friction to like give it the right access, you know, kind of put some amount of guardrails, like it has to be. And I've actually seen this, commentary from a bunch of people who who have executives, assistants or, you know, that type of role or like people who support them, as an individual. And the comment from some of those folks was like, you know, anyone who thinks that, like, agents are just going to, like, magically solve all their problems, like people have human agent equivalents today and like, it's actually difficult to effectively make use of, even another human whose fully AGI artificial superintelligence can do everything you need, has full modality access, interacts in the real world. All those things it's hard to make like really coherent use of of, you know, another human to support you. So I don't my, my expectations are somewhat measured about like whether or not they'll be that like massive viral moment. I think there will be a bunch of things that like provide a ton of value for people. And like, you know, that's actually already happening. So there's tons of products out there that like for very domain specific tasks, they're somewhat agent like, they go and solve a problem that usually would have taken me as a human, like going and doing some work. But I think the magic of LMS and sort of that chat interface is like actually the interface itself, despite it being simple and like, not the best thing for a lot of stuff, is actually sort of what brings the the magical ness of that to life. So maybe there'll be that similar sort of interface, moment that brings agent stuff to life. Yeah, that makes a lot of sense. And. My follow up question is on the form factor. Do you think the final form for these agents are chat? Voice. You know, is it something embedded into our phones? Is it, you know, maybe glasses? Earbuds. What do you think the final form factor will be for kind of these? Maybe not so much the the workforce agents, but more so the personalized agents that are personalized to you as a user. Yeah. This is this is tough. I think it's certainly not shot like it won't be shot. It's unlikely, actually, to be like some software that, you know, a more SAS like application. I do think, like, blending the, the, the space between hardware and software and finding some way to sort of combine those two things together makes sense. And there's a ton of people who are trying to do that. With all the recent boom of, like, I, companion assistant hardware technologies. I do think the, the idea of vision makes a lot of sense. Like, you know, folks are pushing on glasses and other things like that. And again, like, for me, the way that I interact with the world is by seeing a bunch of stuff. So, it would it would make intuitive sense for that to also be possible. But on the other hand, I don't wear glasses all day, so I'm like, I'm not sure yet. All all of these things have their drawback and probably the. Not great answer, but it's probably closer to the truth is that it's going to be that sort of combination. Like there's going to be a bunch of advantages of having like the, like purely software on your computer, personal assistant, while also being advantages of having like, the more, you know, it might actually even physically be a robot. There's all the progress in, in robotics that's happening, being in your physical space, helping support you. So, yeah, it'll be interesting to see how it plays out. I'm. I'm not sure. Yeah, it was a tough question, right? I we don't know. Right. Really? It really depends on on how the world takes it. What do you think? The most surprising way. I will change our daily lives in the future. Maybe more so at the personal level rather than the workforce level. I think and, you know, this has been an idea that people are kicking around and working on for a while now since the initial moment with with labs, really the limitation on the on the usefulness today is that you have to take the first action as the user. And like for me, I'm not always like there are people reaching out to me and providing value to me. And then sort of I'm providing value to them. And there's that, that, you know, two way street of of communication and action happening. And with most AI systems today, it's it's one way sort of I prompt the system and then it gives me a response back, or I tell it to do something and it sort of does what, what I might instructed to do. I think the future is in the medium term, the most value to be unlocked is like the system actually sort of asking me to do a bunch of like asking me for permission or clarification on things that I might want it to go do and really solving those problem. And again, this is what humans do. Like if you're, you know, in a, in a job or whatever or something like not not to go too deep into like the work use case, but like, you know, it's probably not a uncommon question to ask your manager. Okay, so what do I do next? Like the person who's there to help sort of guide you and make you successful. Like it's it's actually very interesting to me that very few AI systems, if any today, like successfully do that, like asking me how they can help in like an actual, not surface level way and in a way that ends up being meaningful. So it'll be interesting to see more systems combine that. And I think back to the threat of like the technology to do that already exists. Like you could stick a ton of stuff into the context window of of a model and, have it sort of behind the scenes on a preset schedule, just like come up with questions and possible actions or things that the model might do. There doesn't need to be any not new innovation like the models already capable of doing that. It's just a matter of like stringing up the system, to to be able to do that. I mean, that's really, really cool and a great point. If the agents become proactive. Right. And they're actually looking at your data, understanding you as a user and asking you to do things. I think that's a really, really cool time to be in. Do you think Gemini and Google is ahead with their context? Windows? And do you think that's a signal of your advantage in this new agent era? I think it is a significant advantage. I do think context caching is a is a huge unlock. It's it's harder to quantify the delta in like performance for multimodal. But I think like video is another example. Like very few models today actually take in video and understand the audio and the images and the text associated with it. So I think those things will continue to be compounded and also like my personal thesis is that's where a lot of the value, like the frontier use cases are where the value is going to be created like people are like. Even consumers are like very used to this, like, you know, basic chat experience. And for people who are building with the technology, the way that you get people excited and like get them in the door is I've built something new that other people haven't built or that you might not have seen before. So we're and again, I'm happy that like Gemini and the DeepMind team and, and us are like pushing on those use cases so that developers have those, you know, net new things to put in front of consumers and help them, like get in the door and build value for their business. Do you think? Well, see if that context window. I think if, I think it's. I think it's likely possible. We'll see. I think infinite is like, large engineering problem. They'll probably like the, like, actual technology could potentially scale to be infinite, but it's like, you know, the cost associated with it gets really tough, etc., etc.. But I believe and, I believe in the DeepMind team, I think, Jeff Dean and, and the whole, the whole crew of people are continuing to do things that haven't been done before. And I think the context window stuff I'm, I'm hoping we see, I'm hoping we see land. Yeah. I think you guys have 10 billion in research. Is that right? Or correct me if I'm wrong. Yeah, I think a bunch of the papers have have talked about what happens or like they've tested it up to 10 million. I think there's like a bunch of, like practical, challenges of like putting that much context into, into a production environment at like a reasonable end user cost and all those types of things. So it's not, it's not available to developers, but I think continuing to push on that research and figuring out like, the thing that Google does best is take those really, really hard engineering problems that no one else can solve. At a massive scale, and then actually make them affordable for, you know, the masses to be able to, to make use of whatever the technology is, across, you know, search YouTube, you know, now with Gemini, all these other things. So I think it's the unique trait that in the world of, I think it's going to, is going to keep pushing us. Yeah, I mean, the the Google debate has just been amazing. Yeah. Like we talked about this earlier, but the pace is a bit crazy too. It is. Maybe for non-technical people. Can you kind of explain like the promise of, like what this infinite context window might mean for them? Yeah, I think right now you have to essentially models only reason about information that's in the context window. So you, you know, ask the model a question if it's not in the context window or if it's not in sort of the distillation of the information that the model has been trained on, it won't be able to take action on it. And the again, the unique thing for humans is that both we have this massive amount of context and we continue to sort of pile on context in our own minds and brains as we learn and go about our days. But also we have the tools to go and like, dynamically retrieve a bunch of additional context and use that to, to take an action. And today's systems, today's AI systems, basically, you as the user need to do all the retrieving and putting in the context. But also it's the models are historically not that great at figuring out what the relevant context is. So like, I, you know, as someone trying to solve a math problem, for example, like know what context I might need to go and get to solve that math problem. And I think there's actually that translation layer between knowing what you don't know actually is, is one of the big limitations. And you sort of can move up. You can kind of skirt around the knowing what you don't know problem if you just can't put everything in that context. We know if I can take all the text messages I've ever sent, all the emails that I've ever sent to every document that I've ever created, and put it in the context window, and actually becomes a little bit easier technically to not have to, to not have to put as much like energy and thought into like, what specifically should I be should the models be taking action on. Yeah, that's a great answer. And kind of the last topic I want to touch on is AGI. So obviously this is this is the end goal. Can you explain again similar to agents that the term is pretty loose. Can you explain what AGI is in in your eyes? Yeah. This is another one of those challenges where, like, we need Marion Webster to come in and, like, delineate the real definition of of AGI. I think there's a bunch of different framings. I think a framing that's resonated with me historically is like models that are able to do, you know, some reasonable portion of like economically productive work that that humans are able to do. Sort of as a, as a proxy for, you know, how useful these systems are. I think ultimately it comes to like usefulness of the system and people use I economic productivity as the proxy for for that, you know, whether or not that's right or the good definition, is something we need. Marion Webster to, to delineate for us. But I think in general, like that's the that's the definition that that resonates with me is like systems that can do the things that I'm able to do. And what's next on the roadmap? What's, what's the current bottlenecks to AGI? Yeah, I think I think there's a lot of the things that like we've already talked about, like I think the models being really good at being able to understand visually the things that are happening. I think longer context is like, again, if you I don't know what the human number of tokens that we can keep in our mind and like, have in our brains, but I'd imagine it's a lot more than 2 million. So like, you're going to need sort of all of these similar capabilities that we have today, to like continue to ramp up, plus a bunch of net new stuff. And I think the thing that gets me excited is all of the net new stuff is a lot of the work that the Google DeepMind team has been doing, like extensively for the last ten years, with all of the planning with AlphaGo and sort of, the strategic reinforcement learning stuff that they've done, which, it's likely that that some amount of that is going to be extremely relevant for this. To getting us to AGI and like the team has the, the bandwidth and the expertise and, has been doing that research and I'm excited to see a bunch of those things like come together and, yeah. And get us closer to systems that ultimately, like developers are going to get to build really, really interesting products with and I think this is my my other tangential comment to this, which is the most challenging thing, I think, for for me personally, working with this technology is always that there's so many cool products to be built that it's like, you know, you have the idea as you're coming up with use cases for, you know, what's the new stuff that's being unlocked by the latest iteration of Gemini models? Like, even for me in the last couple of weeks. And it's like, well, there's a lot of really interesting companies that you could go and build their products to be built, with this stuff. So I'm super excited to see, to see both people take action on the current stuff. But also like that trend just continues. And it it gets it keeps me so optimistic for, the future and for, for people who are excited about building stuff like there's just every new crank of the model. It's just all of these really interesting new unlocks. It's just amazing where we're at right now and how kind of seemingly close we are to the next step. Or the future. So right now, obviously kind of reasoning and math breakthroughs is all the rage. How much does that play a role in the future development of AGI? I do think the reasoning, you know, the models ability to just think about stuff I think makes perfect sense. Like, I think that's a paradigm that, again, back to the early days of DeepMind, like that was the direction that they were pushing on for a very long time of like trying to solve these problems in many ways, like to the sort of path to get to where we've gotten the fact that it ended up being like Transformers and, large language models, sort of as the proxy for some of the initial signs of what look like intelligent systems is, somewhat counterintuitive, but I do think the the fusion of both of those things makes perfect sense. I do think there's, like, practical limitations of, you know, most a lot of production use cases today, like, just can't take, you know, many, many seconds to reason about what the answer is. Like the humans are already so impatient. So there's like a very specific set of use cases where, that type of, you know, delay in an answer for the potential upside of getting a better answer might be might be useful. But I think, like, still today, the vast majority of, like, the use cases that are working in production aren't those use cases. So I'm excited that we're that we're landing. Yeah. Models that are, that are going to help people right now and support a bunch of the use cases that are that are important for developers. And I'm also excited about the future because I think there's a lot of, a lot of super interesting work yet to be done. Yeah. It was great to see you. Hopefully we'll we'll get to catch up in person before IO next year. Let's do it. I love it. All right. That's the pod.