chapters transcript notes
click any line to jump to that moment in the video
0:18 Welcome back agentic thinkers. I I don't know we don't have an audience name yet. We got to figure out what our audience name is for this one. But welcome back to agentic thinking with Mike and Matias. Hello Matias. Yes, hello. How are you? 0:28 I'm doing . Good afternoon. Good morning to some of those of us in the US still. We are an international podcast. We have people from all over the world, you know, joining us today. 0:39 Let's unpack some of the news items. We have a lot of things that have been happening. As always, we are not disappointed with anything AI related. New ideas, new concepts, things are 0:49 coming out all the time. And we have some really interesting topics today that have happened since last week around this time. let's kick off some items here. Let's go down the route here Matias with 1:00 your concept here. I think I think another feature has been released from Codex. And then immediately copied by Claude or Anthropic as . Let's let's 1:10 step into this one. What's this new feature called? Yeah, lots to unpack here. They both released the new goal command. If I remember correctly, I'm pretty sure I'm on that. Codex released it a 1:21 week ago. /goal. If you look at Claude's change log, yesterday's Claude code release mentions /goal. 1:33 Look at that. They've been playing catch up, arguably, but also very very quickly. , in a way, /goal is an evolution of Ralph 1:44 loop, I would say. Where what people have been doing since January or on Ralph loops, where they're running the coding 1:55 agent in a loop, , from the outside. Those vendors have brought that into the agent to say 2:06 the slash goal command allows you to provide a let's say a a long-running complex goal and the the harness itself 2:19 will ensure that it keeps going in a loop until 2:29 until a certain condition until a certain goal is reached. and we've got two links here, , to to the respective documentation for both Codex and 2:41 Bardeen A. Claude and it's all brand new. I haven't really experimented with it very much, but 2:52 it was obvious that something that would be coming because without that vehicle you are limited, , you 3:03 know, without that without some let's call it orchestrating concept around your coding harness, you are somehow limited 3:14 and you end up being a bottleneck yourself because you may , start a long-running turn, but you still need to react to it and in most cases, , 3:27 you you don't get a one-shot deliverable. In most cases, you do have to come back and challenge the harness and this is I think, , what what 3:38 this new goal command is meant to do for you. But it's all based on you explicitly providing, , what's the what's the validation step, you 3:48 know, you don't get that for for you still have to do some upfront work, and you have to give the model a way to validate, , how can it 3:59 determine whether or not your prompt has been successfully delivered on. This, I think, is the trick of this whole system. You could tell agents to build a 4:09 website, but there's not a definition of success at the end of that prompt. Again, this is I think 4:19 we probably have been inadequately using agents or harnesses for a while . And what this is doing is it's taking more of our rigor around engineering processes and 4:30 app development processes and incorporating them closer to what agents cuz I think what if you think for me, as I've learned to use agents, my first interaction was a chat window, I just 4:40 give it a random prompt, it comes back with some random results. this is evolving into these looping experiences, and , , I want it to go build something, but then you need to go check the build of 4:50 what it created. test-driven development is a good example of one of these things. Build a test for this thing, and then make an API that meets the requirements of that 5:01 test. And then you can add more tests to it, and test-driven development is is becoming more front and center for our team and how we're using it. Start with building the tests, and then figure how to make APIs meet or 5:12 achieve to those tests. , I really this. The goal I would maybe even analogy this to milestone or, you know, requirement or it it it's really 5:24 causing me to think a lot more about what is the expectation. What do I require it to deliver? And by more clearly defining the outcome, I can let it chew a lot longer on internal states 5:35 to figure something out. Are you Have you used any of these features yet, Matias? Are you Are you focusing on Have you explored these things yet? not those specific 5:45 features, cuz, , they've only just been released. But conceptually, I've been in that space for a while , ? And my my personal 5:56 state-of-the-art at the moment is to use different coding harnesses in conjunction, ? , that would be my big criticism the reason why I wouldn't use 6:06 those features, ? They They keep you in the same ecosystem. You but , by definition, if you're using goal inside Codex or goal inside 6:17 Claude Code, everything happens in Codex or Claude. , I want to be able to combine different harnesses and particularly those two, , as 6:27 arguably the strongest in the market today to criticize each other, ? And , I explicitly architecturally, I would 6:37 want to run that loop outside of either of them. And , that that's what I've been doing with some success, . But of course, , it means, , 6:48 you need a subscriptions or billing setup with different providers. You you need to have much more 6:58 complicated setup. , goal , within the same harness is definitely something that will solve problems for a lot of people, I 7:10 would say. This reminds what you describe to me. I can maybe maybe I'm let me me rematerialize what you're describing into another way of saying it maybe for 7:20 that I how I comprehend what you're saying. Just because I built code with Claude doesn't mean Claude should be the one reviewing said code. I want I want a 7:30 different trained model something with an outside perspective. Doesn't have the same weights on the model. Doesn't have the same information in it. , , know, and even 7:42 maybe the the code that is the the agent that is building the Sorry, let me say this again. The large language model that's building the code maybe a simpler model, GP54 7:54 or GPT something else, 03 mini, ? We've given it instructions, it built something. But the testing agent should be a different model on a different stack to 8:05 give you a better perspective. And that they're able to do this stuff at scale and automation is a lot more in place, why not have two reviewers? . I'm going to build in GPT 03 mini and 8:16 review in Claude Opus, maybe Sonnet, maybe even GPT 55, 54, something that. What? Or the other way around, , which 8:28 I think makes even more sense in a way. Why don't you have two implementers and one reviewer? And the reviewer picks which which solution to go with, ? 8:38 Yes. Sure, exactly. And that this is all programmatic, this reminds me of a project from another one another gentleman on the internet that I don't know, but he I I follow very closely, 8:48 Matt Pocock. I think him as as . He's a really good educator, really diving into the AI space, and he has this project called Sandcastle. Have you heard of this 8:58 Sandcastle project? Absolutely, yeah. Yeah. it's a bit more above than just a harness and and some things. He's doing a lot more other things here. He's, you know, sandboxing an entire agent and 9:09 putting it in its own little Docker container and then running some code on it and then letting it, , interact. But to your point here, having multiple harnesses or sandboxes together 9:19 where you just orchestrate them together and have them work as a unit is I think really important and very powerful to use. . All , I This is a really interesting concept. I 9:30 also want to talk about the idea that is this interesting that Codex did it first? . I want to talk about the period of time it took Anthropic to add a similar 9:43 feature. What are your thoughts on this in the chat? Yeah, totally. Totally. That's the other big one to unpack, , ? , it's not coincidence, obviously, that these things were released in such short 9:55 succession, ? Yes. Yep. Clearly, these guys don't coordinate, ? , the the the obviously if if you want to guess what 10:07 could have happened here, ? it could have been the case that somehow concurrently both companies were working on the same 10:17 feature and Codex just happened to release it first. Sure. What I think is more likely, 10:29 I I think in terms of where the industry's at, it's pretty clear that this feature was needed and was something that engineering teams needed to 10:40 evolve towards, , with their agents. Yes. , it was conceptually there. 10:50 Codex productized it by making it a feature in Codex. Anthropic effectively had it already, but then they managed to 11:00 productize it within days. Yes. One, because, , they have an engineering stack that allows them to ship that fast. And two, 11:11 the the the underlying architecture needed for the feature was already there. They just didn't think of releasing 11:23 a feature called goal, before Codex did. , that's what I'm guessing happened here. And it's it's it's enormous to think about, , that 11:34 this is the stuff that can happen in days and days. If you missed the first news item about goals, all of a sudden the program that you're using that's not the one that created the feature gets the feature 11:44 automatically. the the I think I think the concept here is the proliferation and speed of feature copying between different tools is 11:55 really high. you can feature copy something in a a couple days into your product . And I think this is going to be 12:08 very interesting to see how the software market handles this. Just in general. just any program. Doesn't matter what it is. Teams, Zoom, Salesforce. any piece of software that's coming out with 12:18 things. If you're able to take the feature development, you see a new feature appear and that's a pretty interesting feature. We want to make sure we have that. And and maybe to your point here, Matias. And and we don't see behind the 12:29 curtain on what's happening on the development side for Anthropic. But they may have already had to your point. I think they're dog fooding. From what I've seen in the past is they have everyone independently building with 12:40 Claude code features and things they're finding useful. They immediately dog food those items into the entire company. Everyone has access and can start using them. And then items that 12:50 get used more frequently then turn into real features. They productize it and they push it out the door. the idea is their whole company is the software development firm. Anyone at 13:00 any level can make a feature and build something inside the product. And everyone gets that immediately. You're probably . I think what was happening here is they probably already had a feature that people were using. 13:10 We're , "Oh, wow. Codex released this thing a week before us. We already had something on the shelf that's similar. Let's just spend Let's just focus our attention on that feature for a little bit. Put a couple days of 13:21 cycling on it. , that our agent has refined it and they made it better, let's push it out. . that is really interesting to me. 13:31 the other thing why it's easy for them to turn things around quickly, they don't have to worry about UI, Yeah. 13:41 ? Those are Those are two-way tools. , obviously there there's a bit of UI , but those ones are, fairly simple and standardized. that from a software 13:51 development point of view, , that that is a huge advantage when you don't have to worry about web UI and and and testing. Just CLI, everything CLI. Exactly. 14:02 [laughter] let's let's talk about speed of things. I want to maybe shift the conversation here slightly to a little bit something else. have you heard of another company called 14:12 I think it's Cerebrus, I think is the name of the company. Cerebrus is the company. It's Cerebrus AI, and I'll put the link here in the chat window. Matthias, I stumbled across this in one 14:22 of my feeds. . And this is a chip company. It's making computer chips for GPUs, ? , for scale, let me 14:36 I'm going to ask a question, and we don't you don't have If you don't know the answer directly, that's fine. I I totally understand. But I'm just going to throw a question to you. When you look at models that you run, whether it's on Anthropic or whether 14:46 it's from Codex or even if you're running things locally on your machine, when you run models, do you have a number? Do how many tokens per second are generated? words that are coming , it's 14:57 not really quite exactly the same thing for tokens, but , , I get my mind proxies the, , number of words in versus the number of words out as tokens. That's probably not a good analogy, but 15:08 that's how my mind works about it . But tokens per second is a thing. And my question to you is when you're using the models you use today, what is the rough number of 15:19 tokens per second you think you're getting? . , I don't It's not a metric I generally look at. I'm only I'm only familiar with it when I run models locally cuz LM Studio, for instance, you 15:31 know, which I generally use for local models, exposes tokens per second very nicely. there, depending on the model, I get somewhere between 60, 80, or 100 tokens 15:42 per second. I don't know about any cloud-hosted models. Yeah, it's it's hard be a very different number there, I guess. Yes. And also, if one thing I've been 15:52 that I'm more interested in the tokens per second number a little bit cuz of the Cerebras company, and promise I'll get you to the I'll get you to the meat of the conversation here in a second. I didn't really understand what things 16:03 were doing. , I'm going back to the debug mode in Copilot, and it tells you how many tokens were used and how long. Yes. , there's a little bit of a 16:13 but it doesn't really give you It gives you time and then tokens. It doesn't tell you tokens per second. , I'm doing some of the some of the back back of the napkin math 16:23 . to work that out, ? I I've got a Yeah, I'm going to pull up VS Code and see if I have a session that has some debugging it, but it's it's taking it 3 16:33 seconds and I ran 2,000 tokens, ? Something that, ? , or or whatever the number is, , 150 tokens or it took it 15 seconds to do something. What is I ran 1,000 tokens and it took it 3 16:44 seconds, ? That's a number where I think they're getting the tokens per second item from. , , why do I bring this up? The Cerebras AI company, and if you go 16:54 look at their website, on their homepage, they are talking about blazing-fast inference powered by the world's fastest and 17:04 largest processor ever. ? For comparison, a Nvidia GPU, I think I if I read the numbers , an Nvidia GPU that 17:16 they're today has around 22 billion transistors on the GPU for for an Nvidia for Nvidia. This is a B200. or sorry, 200 a B200 Nvidia GPU. By 17:28 comparison, the Cerebrus chip is not just a single chip a wafer. when you build computer chips, you build these things called wafers. That's 17:38 how they build the chip up. The entire wafer is the chip. this is a massive volume difference and it's pushing two 17:48 1.2 trillion transistors is what I think I saw. we're going 22 billion to 1.2 trillion transistors to do the inferencing. This sucker is fast and I'm floored by 18:02 the performance, ? what I was trying to just get my head around that their whole MO is and think of this. This is this is token economy at scale. 18:15 When I look at the AI world in general, everything boils down to how fast can I run tokens through a system. It doesn't matter if you're Microsoft, if you're local on your own machine, everything 18:26 boils down to speed and access to cheap tokens. This whole system runs on this. This is blowing my mind. You're just talking about 150 to 100 tokens per second. This 18:38 sucker is pushing 1,800 tokens per second in certain scenarios. I was no way. This is not There's no way this could be true. 18:49 I'm writing I go into the program. I'm even I'm I'm bought into this. I bought the API. I threw some money down. I'm I need to test this. What the heck is this thing doing? I get in. 18:59 I'm really excited about this. It is I told I just threw it some simple tasks. Build me an HTML HTML website 19:09 with this this and this on it. It built the whole site, everything, the full HTML over a thousand plus lines in less than 3 seconds. 19:20 It was I couldn't prompt it quick enough. By the time I hit enter, the answer was appearing. And I couldn't It how sometimes when you work with an agent and you send a message to it and 19:31 it starts writing text and putting tokens out for you, ? You start scrolling the window and you can see it thinking and it does It provides text back to you. You watch it think through the text on the screen. 19:42 Zero of that. There's There's no You can't scroll faster than the thing is producing the code. . I was floored and I thought 19:53 one, it was very impressive to see the speed and how fast this thing could produce tokens. And I was using some pretty large models. Their demo website, if you go 20:03 to Cerebras AI, you go play with the the website, when you look at their models that they're providing, they're open-source models. , it's Quinn and it has this XAI, I think, model that 20:13 is It's not XAI the company you think of today, but it's some other open-source model, a GLM model, I think is where it's coming from. , something that's still open-source. But, these models are really big. They're 20:25 hundreds of billion parameters in size, they're substantially large and they're just ripping tokens there's no tomorrow. And , I thought to myself, 20:35 I'm We are seeing the acceleration of the entire AI This This to me is an inflection point at this point. If the models we were using before 20:47 could produce say 300 tokens per second, whatever that may be, or whatever they're producing on the on the main the mainframes. If you can double or triple that or 20x that to 20:57 models that can produce tokens that much faster, this is going to greatly change the entire expanse of the system. And I'm I'm looking at this going, what does this mean for me? What does 21:09 this mean for what I can If you can render an entire website in 3 seconds, does your do you even need a website? 21:19 Do you just give the AI agent what you're trying to sell, and then when the user shows up, it just based on information it has about a user the website it just 21:29 presents it materializes the website out of thin air and none of it's real it's always generated in real time exactly what users are clicking on. This makes This makes UI generation of stuff 21:42 fast, you're going to change what you build. Anyways, I was just just want to pause there. What are your thoughts? Can I I'm going to I'm going to 21:52 challenge that little bit if I may, ? Yeah, sure. I I would argue that real-world agents don't generally have token generation 22:03 and token flow as a bottleneck at all. Real-world agents have value because of tool interactions, , there's a lot of IO, , on your file 22:14 system, but most importantly IO with external services. in fact, I would argue that a good agentic system is one that very rarely 22:24 relies on inference. A good agentic system, , particularly from a cost perspective, is one that has a huge degree of determinism in it and and 22:34 only certain aspects of it rely on inference as in, , a call to a model if and when absolutely necessary. Otherwise, things can get 22:46 very wasteful and , we've talked about this on the podcast many times. consuming tokens gets more and more expensive . It's definitely not going in any other direction than that, 22:56 ? anyone is advised to design a system in such a way that the amount of tokens sent back and forth is 23:08 minimized massively, ? And , this what you described may be really, really impressive, but the scenario you described was you give a 23:19 prompt and then it creates something for you in return out of thin air, ? a greenfield project, ? You get something produced for 23:29 you out of nothing. In reality, you have brownfield projects, , in the sense that you want you want to do an iteration, 23:39 you you want it to I don't know, perform some actions on your emails or on on some, , some some some other inputs, , in a in a 23:53 messaging service or . , this is where agents have to interact with the real world and that's where you cannot 24:03 speed up things at all using hardware. as I'm thinking about that, there's one very interesting research paper that comes to mind, which I looked 24:14 at recently. Remember a while ago when Anthropic had this massive presumably unintended leak where they accidentally published Claude code in 24:26 conjunction with source maps, which then meant that loads of people out there were able to completely reverse engineer the Claude code harness. 24:36 There's a research project which came out of that, which was published in early April, almost exactly a month from 24:46 today, which the the the research paper is published as a GitHub repository called dive into Claude code, very interesting. 24:56 Can absolutely recommend that. And one of the headlines is fascinating. They're saying 1.6% 25:08 of the Claude Code code base is AI decision logic. And 98.4% is deterministic. 98.4% 25:21 is the non-AI harness that makes Claude Code what it is. good. makes it as good as it is, ? And this is what, you know, real-world agentic systems should 25:32 look . and that if if anything, that's proof, , that that we need to invest much more in 25:44 harnesses. We need to invest much more in, , agent design outside of the of of the inference model 25:54 and outside of prompting, ? and yeah, it just happens to be something which, , I'm very deeply invested in . , I I've 26:04 I've I've been doing a lot of work around different harnesses. hint hint [laughter] 26:17 And it's definitely something which I can which I definitely very much believe in because I've experienced it. 26:29 I , I agree with your point. I agree with your point. I And I do it I think Look, let me be clear. Harnesses are what make the large language models sing. 26:40 Bar none. And and I love the paper that you're you're presenting here. I have heard other statistics that 60% of any good large language model is always the 26:50 harness. And it's probably even hard it's even further more these looping systems and deterministic systems that are built on top of , a lot of I think what you're describing here is the deterministic system that 27:00 you put on top of the of the large language model helps it guardrail the large language model to give you better more accurate results coming out of it, ? , it is it is 27:11 a wild animal, and you've got to wrangle it to some degree, and that's where skills and harnesses and everything comes from. But, what what where my mind goes to 27:21 this is anytime you have a leap in production of speed of anything, . brand new businesses, solutions, technology 27:32 builds on top of that, ? And and I'll maybe pull out another analogy here. Google has also decided that that's an advantage for them as . Google decided, "Look, if we want 27:42 more people using Google, what we need to do is we need to make higher speed and cheaper internet for everyone across the US and a lot more areas." , Google made a decision to say, "We're going to 27:52 invest more in fiber distribution for many other cities across the US." And what that's doing is it's dropping the price of the ability to have access to 28:03 the internet. , when that speed turns up an order of magnitude higher than everything else, what that does is it spawns more video usage. It spawns more YouTube usage. It 28:14 spawns more applications. Applications get bigger and larger. , anytime we see a shift and substantial increase and 28:24 improvement of some technology stack, I think what we get from this is things we don't know about yet, but it's it it generally drives new innovation in that space. We're seeing a a general 28:35 shift here. , to your point, I still think the story of dollars per token or dollars per million tokens is 28:45 the story here. And when you look at their documentation, they're saying that they're able to reduce costs on token generation by 30% or something that, ? , , 28:55 that's just on the chip, , how many million tokens can I produce for a certain amount of dollars, but there's also a lot of other costs and other data center providers will talk to you about this, ? It's not just the 29:06 cost of the chip and to run the tokens and electricity, it's the power grid, the cooling systems, all the building you got to put it in. , anytime you can consolidate that down to a more 29:16 concise system or package, it will become better. I don't know if you saw the the announcement here recently, but I believe Nvidia just announced a satellite inference 29:28 chip module. Had you Did you see this one? No. I believe on Nvidia, they just announced a satellite inference chip that you can send up into 29:39 satellites. this sounds just SpaceX trying to put data centers in the cloud. . to 100% , that sounds exactly what they're going to do is they're going to move all this inference 29:50 into space into satellites, and you have unlimited power cuz of the sun, and you've got unlimited cooling cuz you're in space. , two of these really big problems are looking they're getting solved by Nvidia building a 30:02 new chip. And I'm , yeah, I can see the writing on the wall on this one. This is clear as day to me. You're trying to build data centers in space. but , 30:12 the this is what we're walking into. , we're walking , I see this as the first couple steps of the next couple months of where we're going to start 30:22 seeing companies really doubling down on inference speed, dropping token cost per million tokens, and making that more efficient. Because it To your point, Matias, ? 30:33 The harness runs on my local machine, typically. Or in a in a VM or something that. Yeah. That's a very fixed and known cost. I bought the computer that's on my desktop. It cost me zero new dollars 30:45 other than electricity to run the harness. The only variable in this system that I think is expensive is the large language model cost. That's That's the expensive piece. And I could run a 30:56 harness all day long, no large language model, cost me 30 cents in electricity or a dollar or five nothing. It's nothing. 31:06 the deterministic side of of software is cheap. The undeterministic side of software is expensive. And this is I keep getting my 31:16 head around this. Whenever I talk to Microsoft or PMs or trying to communicate things, I'm , we need the non-deterministic nature of large language models to build more deterministic software for us because 31:28 that's what runs cheaply. That's what run effect. , the hard part is getting the software written, but then the cheap part is running it on systems that are -known and 31:38 understood. Does that make sense what I'm describing? I want to agree with your point, but I also want to expand the idea slightly into this new realm of 31:48 we're seeing a new We're seeing a downshift. We're We're accelerating to another gear at this point. to this new world here. , it It 31:59 reminds me a bit of decades ago when multi-core CPUs appeared, ? Exactly the same shift. 32:09 You're suddenly integrating compute capabilities into the same physical space, but providing multiples of those. Sounds 32:22 exactly the same pattern here. that always comes with cost efficiencies in terms of energy consumption, production, all of 32:33 that. , yeah, yeah, totally. I I didn't mean to argue against that at all. I I was just I agree. trying to say 32:44 there is a lot we can do as we build systems to from the cost control point of view, you 32:54 know, and there are probably orders of magnitude between -designed and poorly designed 33:04 agentic systems in terms of cost and that's also something you know, you brought up a CFO story a few 33:14 weeks ago on the podcast and we said this is definitely something which is going to become more and more important moving forward. there's still a huge variety in that 33:26 space. I I believe that was the CTO of Uber saying, "Hey, we've got many agents or many agentic experiences being run for developers. They burned through their entire R&D budget in the 33:38 first quarter of the year cuz they had spent I think it was something on the order of 3 million dollars of research and development and all the spend was done in the first quarter of the year. 33:48 I sent you a couple links Matthias and also in the chat here as . The cloud data center is in the cloud . We're talking clouds and data centers. Nvidia is launching they 33:59 announced launches space computing and rocketing AI into orbit. Clearly this is SpaceX and Nvidia working together to build stuff in the 34:10 in the cloud here and one of the compute units they're trying to push into space is this one called Vera Rubin platform. It's the next generation of AI 34:21 inferencing launching up into space into the satellites at this point. this is This is really exciting I think from my perspective, ? This is the whole purpose of this is to nobody wants a 34:31 data center in their backyard. I've heard many of had bad stories. People don't want the higher electricity cost. They don't want the loud noise. It uses a lot of clean water. There's a lot of challenges with getting data centers 34:42 showing up in your area. And it feels they're building them all over the US, everywhere. I think this is the next Mission Impossible goal here is to start shoving 34:53 these AI inference machines up into space. And I think at the end of the day, all of this feels really good to me because this is just going to drive down the price of the token parts. 35:03 That is going to make it more accessible for me to run inferencing all day long to help me build more of these deterministic systems. By the way, I just shared the GitHub 35:14 link to that research paper I talked about earlier with you. , hopefully you can make it available in the show notes as . me put that here in the show notes. I will also add Who's the It's Viela 35:26 Viela Lab, I guess is the name of the company. yes. I don't know how to pronounce it. And Viela Lab it it's talking about 35:36 again, to go back to your article that you said earlier, ? The majority of the success of an agent is written around how the harness can wield it, ? That's really the the thesis of the paper. Correct. The the 35:48 the tools you give to the agent, the the way the agentic loop is orchestrated, what it does with your prompts, how it 36:02 compresses or not your chat history, there are there's an infinite number of possible optimizations there. which is why we also have such such a 36:12 proliferation of harnesses out there, . if you as if you as I have done recently, go out and look at them all in a 36:24 comparative way, it it your job is almost never done. . but it just shows that this is an area that requires it a lot more 36:35 research and and innovation and experimentation nowadays. And not one harness is the perfect harness 36:45 for every single use case, ? There's an argument to be made that in certain domains or for certain applications 36:56 a a dedicated harness may be the choice. That's interesting to me. is this is this a 37:07 Matisse, if I had to if I had to take your comment and rematerialize that into another thought here, is this the idea of hyper-personalized harnesses or harnesses that are 37:20 in the same way we have a compute unit a computer a CPU general processing computing and then we we hyper-personalize that to graphics processing units which we're we're 37:30 building inference processing units. the more we specialize on a specific thing you're able to really tune the performance of the harness, the system, 37:42 the GPU to really do something very specific. And the more we hone in on that, the higher the performance we can get out of it. , I think this is the same principle that Apple uses with 37:52 why do they build their own silicon? Why do they build their own chips? Because they can build chips that pair very with their software and they run very efficiently together. There's no extra waste that they have to 38:02 do. When we go back down to the CPUs of your computer, it has to generally compute a lot of different things. It's it's serving many different purposes. It's not really good at any one thing, 38:12 but it does everything . This seems the same principle here with harnesses. You can build a general harness that does everything really or , but nothing really, really 38:24 . . And , tuning the harness for the workload, here's a loop here's a Ralph loop, or here's a a goals loop. Let's build a harness that just handles that 38:35 experience. I need a different harness for interacting with my work data. This is maybe where co-work Claude co-work comes in, ? That's a slightly different 38:45 harness. there's a a CLI harness. That's a different way to work on something. , I think I've also experienced this a little bit too with GitHub Copilot because I'm juggling between the CLI, 38:56 VS Code, and the web browser. those are three separate harnesses, maybe similar in some ways, but they all are slightly different in how they implement the harness. 39:08 And obviously, , all the harnesses, which are generally available nowadays, they offer you extreme ways of customizing them, you 39:18 know? You give them you give them context files, , things agents.md or cloud.md, ? Or or other agent instruction files. That's a 39:29 way for you to customize the agentic experience. And then you give them tools, skills, and MCP servers, again, ? All of 39:39 that ultimately impacts Complications. what the agent has access to and how it's grounded, but what you cannot influence at all as a general 39:53 user is what is the fixed system prompt that comes with the harness, what are the built-in tools, that are very, very 40:03 fundamental, things read, grep, write, make to do, , what memory capabilities are 40:15 there, if any, ? That's definitely something which a lot of modern harnesses have and and use extensively. this is where harnesses differ 40:26 sometimes significantly and this is where your abilities to tweak them are limited, ? And , that's that's where you 40:36 have to either make choices or where ultimately you may decide I'm I'm building my own because I , I really want to tweak my agentic 40:47 experience much further than I can just by providing agent instruction with skills files, ? . Wow. , I never thought of it that 40:57 way, but all of these skills MCP servers are just parameters you're using to customize the harness. I didn't think of it that way, but that's a really good mental model of how you 41:09 unpack what a harness does, ? The harness is simple, it does some things, it's a this new language of the word opinionated seems to be really popular 41:19 . An an opinionated approach of how you use the model, opinionated approach of how the the harness works and then these other features are all 41:29 parameters. Interesting. Didn't think of it that way. All . Really great talking, a lot of cool stuff coming out this week. We're going to continue on this agentic thinking experience. 41:39 Matthias has some more demos coming up. we're currently working through a github.com interaction with agents . Currently, we are making issues, we're 41:49 changing our semantic model and we will be continuing to modify our semantic model using our agents. We were going through different experiences of how you can customize your harness Indeed. in order to build 42:01 better output for your semantic models. I believe on Friday, we're going to continue down this path a bit further. We got through I think almost two different ways you can customize your 42:12 harness. Yeah, we've got two more to go. two more we're going to go and we're going to keep trying to get refining the output we get less of an intern building semantic models but more of an 42:22 expert or trying to move towards that expert level of modifying, changing and and working with semantic models. Anything else Matthias you want to close off on? 42:33 No, nothing nothing at this point, sorry. It's an exciting world we live in . This is really neat. Stay tuned for 42:44 more information and content around this. Also, if you this content, if you going deeper, the news on Tuesdays and the demos on Fridays or if you want more parts of this content. What are things that 42:54 you're working through or unpacking or struggling through? Let us know down in the comments on the various social media platforms. We're looking to cater this show and or program towards what you're 43:04 interested in learning about. Thank you very much Matthias. Appreciate your time today. Thank you all for listening and watching today and we'll see you next time. Yeah, thanks everyone. 43:18 Agent Z thinking [music] Agent Z thinking [music]