chapters transcript notes
click any line to jump to that moment in the video
0:17 Hello everyone and welcome back to the Agentic Thinking podcast with Matthias and Mike. Matthias, hello. Good afternoon, good evening. Yes, hello. It's another Friday. Can't believe it. [laughter] 0:27 We've been doing this for a couple weeks and already things are just clipping along. We are already on episode 11. We've been already doing a lot of great content on the show. I hope you're enjoying things and learning a lot from us . Our main topic for today 0:38 will be diving back into what we've been doing the last couple of weeks, which is just improving our agent with instructions and skills and and different ways we can get better output 0:49 from working with an Agentic experience. these are skills that we've had to learn and and mull through and work through together. and we're just going to share this knowledge to the community as . But before we get 0:59 into our main topic, let's talk about some news. Matthias, what have you been finding in these last three or four days since [laughter] everything's been happening? I was just reflecting. It's it's 1:10 easy coming up with news items in in our space because things just happen every single day, ? , one link I wanted to share was 1:20 Anthropic a couple days ago, maybe even yesterday, announced some pricing changes. 1:30 pricing updates is something we've been discussing a lot, particularly around GitHub Copilot lately. where there have been some rather drastic changes announced for June. 1:42 Anthropic is following . And it's it's a mixed one, I would say. , you can 1:52 see this as a huge disappointment. You can look at it as a long overdue clarification. 2:02 what they're saying is from mid-June, starting June 15th, when you have a subscription, , 2:13 the most affordable way of getting AI credits nowadays, when you have a subscription, it will be officially allowed to use that subscription for 2:26 automation, for for headless use, for for use through third-party applications. That has previously been a very gray 2:36 zone, and there have been It's They have never been really clear about it, but it's it's been more disallowed 2:48 than anything else in terms of , open claw and and other third-party applications that, using your Claude 3:01 allowance through your subscription key as opposed to through an Anthropic API key. This is talking about and they're calling this term the agent SDK. It's , just that 3:13 in the language in the article that you're going to see, there's this language around the API usage directly to their servers for your third-party programs, not directly theirs. Agent SDK 3:23 is what they're I had to put my head around that, too. I was "Oh, that's what they're talking about, agent SDK, the API calls." . Exactly. However, what they're talking about here 3:34 doesn't just apply to using the SDK, it also applies to Claude-P. , anyone who's worked on a more technical level with Claude, 3:44 they know that Claude CLI, if you use -p, you put it into headless mode, , and then it doesn't open a TUI for you, but it just, , runs your prompt 3:55 and and the agentic loop, , hidden, , in in in your shell. the for those who didn't know what TUI 4:05 stands for, terminal user interface. Yes, is that what you mean by terminal TUI? [snorts] Yes, as opposed to GUI, the graphical user interface, [laughter] or Yes. 4:16 or, , CLI, the command line interface. There we go. We We have terms when we talk Matthias, we have our own language of things that we say. My family does not understand when I say 4:27 words that. They're , "Oh, just open the TUI." They're , "What are you talking about?" , just want to make sure for those who are not in the terms of what we're talking. , these are more developer terms that we're then 4:36 giving to more of the business users as . , just defining some of those for you what those Absolutely. And to to be fair, Cloud Code, I would argue, was the 4:46 tool that really put TUI on the map. Yeah. that was the very first broad, highly popular tool that had a TUI interface, , 4:58 predominantly. , that's what they started with and, , they pushed it very, very far with respect to, you know, the user experience you can provide through it. And it's become the 5:09 gold standard . And, , ever since Cloud Code, , launched or pioneered, , that space, , everyone else seems to be building their 5:19 own TUIs nowadays. Yeah, it's copy. It's, , it's a very, very neat balance between no UI and a 5:29 GUI, ? Building a TUI gives you a can give you a very rich user, , experience and user interface without all the costs that a full graphical user 5:39 interface comes with. . , back to to the news then, Yeah. the clarification that we got was to say, "Moving forward, it will be allowed to 5:49 use your subscription credits in headless ways through the SDK." how and you will be getting a 5:59 dedicated allowance every month which comes with part of your subscription. And that's that's the interesting thing. 6:12 And that's the thing that is really, really changing things from where we are . Because obviously, when you're using subscription credits, the token per dollar you're getting is 6:24 substantially orders of magnitude more than when you're using API credits, ? And , the new model means that 6:34 you have a dedicated headless allowance for SDK and dash P usage. it happens to be the same dollar amount 6:44 as your monthly subscription cost. , if you if you have to a $100 a month subscription, you get $100 a month use for 6:55 non-interactive use. but that will be used using normal API prices, normal API rates, which is very similar to 7:05 what GitHub Copilot is doing. , on the one hand, there is clarity and it's no longer a gray area. On the other hand, it does 7:16 mean if you're using Cloud Code non-interactively, the the runway you're getting will be substantially smaller than where things 7:27 were previously. , , we're really talking orders of magnitude here. But there we go. it was inevitable. Everyone knew that this was going to happen, . The token 7:38 wars has happened. We're fighting and I also honestly think we're seeing a little bit of multiple companies fighting for enterprise attention around token and token usage. We have, you 7:49 know, pricing changes for coming for GitHub Copilot. We have Anthropic and OpenAI fighting against, , hey, I'm giving a 2-month discount. I'm giving a 3-month discount. I'm giving more tokens 8:00 here. I'm opening up your your limits. there's a lot of infighting between these enterprise customers that are then clamoring not to get AI into their systems cuz everyone's deciding that it adds value. 8:14 This is healthy competition I see in some way. In some way, this is healthy competition to drive down the total price of tokens. I don't know if you saw the release of Cerberus AI. I put some bids in to get an IPO of that one and I 8:24 I didn't get any. , but the initial bid price of the company was a $185 per share. Immediately, it got sold on the market and went all the way up to $350 8:35 per share. And this is an inference company that's not focusing on training. They're focusing on giving tokens out at cheap, fast rates. there is a lot of economical 8:45 interest in being able to produce really fast models that can produce lots of tokens at fraction of a cost to run these things. , very exciting to see 8:57 the world trying to start looking at the optimization stage and what agents are. Speaking of other quick announcements, I want to throw one more else in here that I thought was interesting, which is 9:07 the release of GitHub app. , I'll put the GitHub app link here as . This is something that's been out in preview for a little bit, but it's officially out in the wild. , it's out. You can buy You can go 9:18 download a dedicated application that is focused around It's It's a , I'm going to use all the terms we just used. It is a GUI wrapped around the CLI of 9:29 GitHub Copilot. , it's a graphical user interface where you can talk to your agents, your agent sessions. I I meant to really bring if I have a sessions running on my computer, if I have a remote session with an agent on 9:40 my machine, and I have an agent running in GitHub.com, , court, how do I talk to all those agents in a single place? What does that look ? How can I have three or four repos open at the 9:51 same time and having agents working on different repos at the same time? I believe this app is going to try and help resolve or solve some of that centralization around how you interact with agents and building with code. 10:02 I'm looking to test it. , that's one thing I'm looking at here as . Let me pose a question back to you. Matias, you asked me this just a second ago. . And I'd to go back to you around 10:13 the the question here as . You had asked me just before we got on the call, "What are you working on? What's the AI thing that is is picking your brain? Are you still using 10:23 VS Code as your primary interaction when you talk to agents?" . And I'll I'll just pose that question. Matias, what are you using ? What is your main mode of interacting with agents and 10:35 what are you finding value with ? I've got a very multi-harness setup, ? I I don't vendor lock-in at all. 10:48 I'm a huge fan of code code. I'm a very very big fan of code X and I love VS Code and and Copilot within VS Code 10:58 because it's it's a really good user experience and obviously the new VS Code agents window makes that sort of a a much more prominent feature in 11:08 the tool. I to switch back and forth between those three mostly. I'm always keen 11:18 exploring other experiences and other harnesses, but what I'm currently doing much more than previously because I've got the the tooling in place for 11:28 it is I'm using agents in headless ways much more than in interactive ways, not sitting me sitting in front of a chat 11:41 giving instructions and looking at the reasoning steps of the agent and and waiting for something to happen. I'm currently much more in more automated workflows 11:55 where I create a a a a complex back and give that into an agentic workflow where 12:06 a implementation agent and a review agent go back and forth until we're 12:17 happy that the spec was implemented correctly. and I've I created some tooling around that and I'm experimenting a 12:28 lot in terms of what is the what is the best combination of , which harness to use for the implementation, which harness to use for the 12:38 review, which harness to use for the orchestration, which models to use. there are just many possibilities and I 12:49 to do experiments to find out what's the best. Love that. I I will say I'm branching out a lot more from what I've been doing. traditionally I've been very VS Code. I 13:00 having an interface in front of me. I being able to talk to the agent. Where I'm finding some friction for me is the command line interface is is appealing. I I a lot of the advantages I get with command line 13:11 interfaces. I'm still trying to kind of suss out what does it look to have VS Code up and running and what does it look to have a command line interface and , does VS Code look at the the folder structure cuz I 13:22 really the idea of having the folders and is to me and then picking on different files and dipping in and out of things of code that it's building. On 13:32 the other hand, I also really the experience of being able to talk to this CLI. It seems to be much more powerful on what it can see and observe on what it's building and how it's leveraging things as . , that's 13:43 one area that I'm exploring is moving more towards a GitHub Copilot CLI, leveraging that. I've also been really curious about these other harnesses. , not the big 13:54 three, ? , we're talking Anthropic, we're talking Codex, we're talking Microsoft. Those are the big three harnesses that were you've been you mentioned there. I'm interested in Forge Code because 14:04 there's some benchmarks that say Forge Code is the number one best harness out there. And and and Matias last on Tuesday, you sent out a paper 14:14 that said the harness makes a big deal about how your agent performs. , I've been doing some exploration around Forge Code. I installed it on my machine. I'm just trying to get my head 14:24 around it what does it do? I'm using the Forge Code CLI to just understand what is it what is it making what is it doing ? that's one area there. 14:34 I also have an Open Claw, which is again a system that's it's a harness. It's a weird harness. It's it's a really interesting system. It It's maybe even bigger than a 14:45 harness, I think, in some degree. But, I've also recently installed Hermes. Have you heard of Hermes? Yeah. , Hermes is another version massively . trending massively. 14:55 Even beyond Open Claw. I think you're . Yeah. , Open Claw was the first one to have this agent on a computer. it's it's the Codex computer, I guess, equivalent. no, 15:06 no, Perplexity computer, I guess, is maybe the company. they're doing something. Anyways, the I'm exploring Hermes. I installed it on a Windows machine and I'm using PowerShell as the 15:17 command line to run Hermes. , it's in beta. It's in very early release for the Hermes agent, but I'm testing that out on a different machine to see how it compares against Open 15:28 Claw. I'm finding some really interesting parallels, and it's proving to be pretty useful. Both of them are going through my Telegram, I have two agents I can talk to independently and do different things. , I'm really 15:39 trying to explore a bit more of the surface area of these other agents. , anyways, I'm exploring a lot. I don't really have any findings yet that I would report other than the fact that I feel very 15:49 overwhelmed, and there's many new things coming out that I'm having a hard time keeping my head around all of it. that being said, any other final thoughts on harnesses and things you're 16:00 using, Matthias, before we jump in today? . I've got too many, I think. that's that's [laughter] talk that for for a dedicated episode. 16:10 for future reference. Correct. Yeah, otherwise we won't get to the demo. . That being said, I'm going to go over to your desktop, Matthias. Let's jump in, and let's go a bit further down on refining our agents. We're doing some 16:21 work last week around making some better prompts, and we're going to make some other stuff. , let's jump in there. All . 16:32 Here we are. go. , just to recap, this is the the game plan from last week. , we we set ourselves, , 16:44 a Tim Dale modeling task to perform agentically. 16:54 We used an agent in conjunction with Microsoft's semantic model MCP server to 17:04 analyze a model and identify some areas for improvement and even spec them, ? And then we moved on to using another agent 17:15 to implement those specs. And initially, we gave the agent a very, very small prompt, if you will, and 17:26 we got mixed results back. , we then reflected how can we improve that experience? What What are the various levers we have to make 17:37 an agent with that prompt much more capable, ? And , what I've had listed on the screen are four different things we want to 17:47 try. two of which we have tried already. which was rather than just giving a half sentence 17:59 as a prompt, give it a bit more detail in terms of explicitly what we're looking for, ? And we, you know, definitely got some better results 18:09 there, but also some surprises. the second thing we tried was to put those additional 18:19 instructions into repository in into static repository scoped agent instructions using the agents.md 18:30 convention. and that's where we left things and at the end of the last demo. two more things I I that really want 18:41 to get to today is custom skills and and custom agents, ? , ultimately, what people out there need to understand, ultimately, all of that 18:54 comes down to giving more verbose instructions to the agent, ? , ultimately, all of that comes down to number one, giving it a richer 19:05 prompt. But, the options number two, three, and four allow us to ease our work of still providing orders for both 19:17 instructions without having to do it, if that makes sense, ? , the options three, two, and four mean we can we can pre-create, , that that additional context. We can 19:28 pre-create the additional verbosity, we don't have to provide it explicitly in the chart. And that's what it comes down to in the end. And , agent 19:38 instructions, skills, and custom agents are ultimately just different vehicles of providing, , that extra verbosity. That's that's it. There is there is no magic bullet here, ? 19:50 . , the folder I'm looking at , just for recap, I think this is the one 20:00 where this is the one where I'm confused . Oh, yeah, . , 20:11 this is the original one. , that was our very first one where we weren't quite happy with the results, ? Yes. , if I then switch here, 20:21 this is the one and I've got that open here. This is the one where we've provided a much more verbose prompt to begin with. 20:32 and we definitely got That was one of our main criticisms. We we definitely got 20:44 much more enjoyable Yep. let's say measure descriptions, ? previously, 20:54 the measure descriptions we got there were very, very technical. They just rephrase what the underlying function and formula does. whereas , this one gave 21:05 us measure descriptions that are business-focused that are much more explanatory from a semantic point of view rather than from a technical point 21:16 of view, which is what we wanted. The weird thing was that somehow this agent decided to only go partway, to only provide the descriptions for 21:28 some measures and and it completely left out others. And it was completely unclear to us why it would have done that. That was it was different 21:38 from the first experiment. it was definitely not something that we had even implied in our extended prompt. it was strange, unexpected. And 21:50 then the third experiment should be this one here. The we haven't really looked into that too much. this is the one where we created agents.md, which is a 22:00 convention. whenever that file exists in the root of your repository or in a subfolder, most agent harnesses, and this is 22:11 something which the agent harness has to pick up. This is not something that your LLM would know about. Most agent harnesses will find those files automatically and they will 22:22 automatically put them into your prompt and chat context without you having to do anything. this is a very neat way for you to write down 22:34 fundamental do's and don'ts. and always have them part of your instructions without you having to repeat them each time. 22:46 This is a a very valuable add here, I think. Again, this is one I'm using more and more frequently as I'm even finding when I work in specific repositories, there's how I 22:56 to work with that bit of code or that line of work or whatever I'm thinking through here. a lot of times, when I'm using these agents.mds, again, this is 23:06 custom to this repo. There is different ways I want the agent to interact with things in a specific repo. I have a blog repo, ? , there's some specific instructions around how I 23:17 want this agent to work with the blog information that we have here. Or I'm working in I'm I made a folder and I'm going to put a bunch of Tim Dulle in there, ? There's there's probably specific 23:27 instructions I'm going to give it around Tim Dulle things as . , this is where I'm using that, , broader access of agents across the entire project, which is cool. 23:37 . , we haven't really observed in very in much detail what this one did. the only thing we looked at at the end of the last show 23:48 and with some surprise was finding out that it also did a very partial job. , here we can see it 23:59 created some display folder properties across four different tables and then 24:09 for some reason it only created measure descriptions for three measures. As you can see very nicely, , in the Git diff view here, this is this is everything it's done with respect to 24:19 adding descriptions. One thing it it got very it did very nicely and very . It distinguished with two 24:29 different commits. very nicely logically organized. One commit for our GitHub issue number three, which is 24:39 measure descriptions, another commit for GitHub issue number four, which is folder. , very nice, which by the way is not what we got through the very 24:49 first experiment. definitely a good one, , would say in terms of , what it has done, it has done very . 25:01 we still need to figure out why it decided not to do other things. , but the one thing I wanted to point out is going down this agents.md route means 25:11 that virtually every single agent interaction you have gets fed that particular context. And , this is the drawback here, ? , if you want 25:26 to provide certain instructions that are scoped to a particular task or that are very 25:36 specific to only a subset of your project, , , maybe that maybe something that only applies to one of 25:46 your semantic models when you have a repository that contains 10, ? Something that. , then it can be very wasteful and , ineffective to 25:56 use the root level agents.md with those kinds of instructions. And this is me trying to , move over to to the next to the third option here, 26:08 creating skills. let me start off , the same way we did last time. , I'm going to 26:20 create a new folder, , sorry, a new a new , branch from this. commit. , we can start off where 26:31 all the other demos started. , custom skills, here we go. , create branch, . 26:42 [snorts] and then from there I can say open in work tree, which opens a whole new 26:54 copy of his code. It also creates a work tree, which is a whole new copy effectively of the project on my hard drive. And , 27:05 we can start from here. let me just make it a little bit bigger before you ask me. what I'm going to say. Yeah, [laughter] I know. 27:16 . custom skills is is something which is very neat. and if I go here, 27:28 you can read up about it. This is GitHub Copilot about agent skills. this is Claude code extending Claude with skills. They give you a lot of 27:39 context here around structure and and and and where to put them. But, what it comes down to is a markdown file called 27:50 skill.md, which you place in a particular location of your project. And again, that's something which your harness if 28:00 it's skill-enabled will pick up for you. And the neat thing of skills is that they do not 28:10 always go into your context. They do not always go by default into all of your chats and interactions. Skills are loaded on 28:20 demand. and , the markdown description you provide is preceded by some metadata where you say this is the context 28:32 within which the agent should load that skill. And if that context doesn't exist, it it will it will always ignore it. , very neat way 28:42 of not clogging all of your conversations and and interactions with too much context. 28:54 the other thing I wanted to show here is I don't really want to learn all the nitty-gritty details about how to create a custom 29:04 skill. and I don't want to bore you with reading through the documentation . No way. , and another very very 29:16 a a agentic development capability that everyone should should be very familiar with is instruct your agents to read the docs on your behalf and follow them. 29:28 They will follow them much better than you could, ? , that's definitely something I want to show here as . what I oftentimes do one little quick thing here, too. Also, I would argue, when you have the link to 29:39 what you're going to show here as , sometimes I'll go to the docs and say, "Instead of you implementing, I want to just summarize this. Give me the Give me the high-level, what are we going to do? How do we 29:48 use this docs to help us?" We know we need We know we need to use it. We know we want to write the skill. I'll even ask the agent questions about the skill I fully understand the concepts of it. And then we can implement together. 29:57 even Even using that as as a technique that I use while I'm getting throwing docs at agents, use that experience. The agent will teach you things. And And use it as a leveraged 30:09 experience to train yourself up on one of these things you're doing as . I just want to add that interjection as , too. one thing which I do very frequently, I I I do 30:20 Anthropic's documentation around how to write skills. I always rely on that. It's It's very detailed. It has a lot of examples. It's very agent-friendly, as . 30:32 if I just copy the link here and go back to my agent, I can say 30:42 "Follow the instructions, at link, and, , 30:53 create, a local skill in this repo, that is invoked, 31:04 when, whenever, , the agent is asked to, perform, 31:16 operations on this, , on a semantic model. There we go. the, , skill should, 31:27 sorry. , instructing and presenting and typing does not go very [laughter] at the same time. The skill should instruct, , 31:37 the agent, , to always, , annotate and, , at at at at , 31:49 document, , semantic, model elements from the point of view of a 32:01 business user and, , should avoid, , technical, 32:11 language, , or mere, technical descriptions. There we go. and it's off to define to have plenty of typos in here. , , the agent 32:23 and the LLMs are very good with that. I'm saying, , create a skill for me. I'm giving it, a a small blurb with respect to what 32:33 the what the gist of the skill should be. I'm also telling it what the , trigger for this skill should be, but I'm I'm giving the agent specific instructions 32:44 on on CloudCode's documentation where it can find out, , what matters, what's the format, where does it go, what makes a good skill. And , I'm 32:56 fairly confident that we're going to get some really good results from that. I was going to ask you to turn that on. . One One thing before you hit send, I think one thing we want to add here, the last time we did this one, 33:06 we said we we got it to make some comments around some measures and some columns. I would maybe make a note here, we want to be a bit more forceful, I think, with the instructions around when 33:16 the skill is used, do it on every column and every measure. I think we want to be a bit more explicit that we call out what we learned from the last session, 33:26 which was it didn't it did good things. It got better, less technical jargon, and more business user-friendly language, but we also want to sit back and say, not only we want that, but we want you to make sure that it's forcing it. And 33:37 again, this is where the instructions are the prompting we're designing the prompt after a couple iterations here, that we get a better result on all the measures, all the columns that we want 33:47 to we produce. I think that's another really advantage here as , is that we force the agent a bit more, force its hand a bit more around being clearer about our requirements. 33:57 I'm saying, when asked to add measure descriptions, do not arbitrarily ignore any of them. Something that, ? great. Yeah, that's a good addition there. Again, what happens is, 34:08 and this is this is the neat thing of doing this in in this staged way, what agents are very good at is to infer intent from what I'm saying here. 34:19 And and they're going to read into this, which is arguably very brief and short and succinct. They're going to They're going to read into this all the context that I haven't been able 34:30 to provide . , , I've got high hopes . Let's send this off. GitHub has not been very fast recently because obviously they've got massive 34:40 performance issues . , hopefully we don't have to wait too long. But, why don't we look a little bit as it's doing? , we can see it's it's just read. It's fetched the content from 34:52 that page. It's And look at that, it's already done not only having read the instructions, but it's already understood them and it's 35:02 already able to turn them into specific actions. Imagine you as a human would have had to do that, ? I mean, I would I would probably still be opening the website at this point, 35:13 [laughter] let alone reading, let alone understanding the whole thing, ? This this is this is the beauty of delegating some of those tasks to 35:24 to agents. There's also another note here. It says, " that I've read everything, let me look at a sample TIM file to understand the model structure." this is something that we found in the past that we work with TIM files. 35:35 TIM file structures it has a little bit of a uniqueness for human readability as it is building things. And , another thing that might be relevant here, particularly for the skill, is there a 35:46 place Matthias where we can give it a pretty good URL for here's the definition of how you should work with TIM skills. Is that something we can provide? , as I'm as I'm thinking through this skill 35:57 that we're building here, we also know that TIM's a little bit tricky for agents to figure out. Could we leverage something that we already have knowledge of to help with the TIM side? 36:07 I would point at Rory Romano's open source Power BI skills, which he's maintaining on GitHub. Yep. 36:18 I I don't I'll I'll I'll get the link here. I'll get the link for you. to get up.com There we go. That's it. 36:29 Power BI agentic plugins. , this is the one and , in here he has Power BI and fabric plugins and the Power BI one contains dedicated 36:41 skills specifically for semantic model authoring. , he's done exactly what we're currently doing. He's created a skill.md 36:51 which has a description that contains the , trigger instructions and then it has , the description 37:02 that provides all the context in that applies when , 37:12 working on semantic models. , and we can we can sorry, we can we can we can look at how to bring that into our project as . that that was one of my my thinking points here was is 37:22 maybe we just for , for prosperity's sake and just not making this go on forever cuz we could do a lot of demo around this one. My thinking here is let's grab this URL, ? We already know what this is this is a 37:32 relevant skill that we may want to leverage or pull into this that we know about it. Grab the URL and then we'll go back to our skill in VS code and say, "Hey, we want you know, Tindall's a little bit tricky. 37:42 Here's an another skill that helps you understand how to work with Tindall directly. Add some information about how to work with Tindall from this existing skill." Just something very simple 37:52 there, I think. And that will I think also refine and get us cleaner output with working with this as . , and in the future what we may do is just pull the skill in 38:02 directly and have it part of our library, part of the repo, but this would be another great opportunity for us to pull in other people's really good work, have them read through it, and adapt this information into our existing skills. Absolutely. , 38:14 let's see what it's done. one thing is I just really that. , you always get a nice summary, whichever task you give to an 38:24 agent, it always comes back with a good report , explaining to you , auto triggers whenever the agent's about , semantic model operations. There we go. 38:35 business user framing, very key. , completeness enforcement, that's the bit you asked for, Mike. , good bad example table, that's fantastic. Didn't even ask about it, 38:47 didn't think about it, but obviously this is perfect. . And then structure, , explaining , what a template description should look . And obviously, we have that as an 38:57 uncommitted change here. , let's inspect it a little bit. , we have description. I'm already seeing , one 39:09 problem here, which is very unexpected and very strange. , if we go back to the to here, it tells you 39:24 types. It tells you very clearly that name and description are mandatory. . Yes. For the skills.md 39:37 skill.md convention. , there quite a few additional , metadata items here, , but name and description are required. I do not 39:49 see name. I'm very disappointed with that because , first of all, it's clear in the , reference link we've provided. Secondly, we're using Cosonit 46, a a very top 40:01 tier model , that is , not great. but , . , guidelines, write for 40:12 business users, examples, be complete, never skip, , and then structure. Here we go. , I'm just going to put the the name here myself cuz one of it annoys me 40:22 and two it should not have missed that. semantic model documentation. Here we go. And we can see , did did you notice what happened? 40:35 Look at this. We have semantic validation that says skills should provide a name. , maybe I was wrong. Maybe it's not required because obviously the 40:46 language here says should rather than must, but nonetheless it it did annoy me too much. and 40:58 obviously the warning is obviously disappearing as with that. , cool. and what is key to understand 41:08 around skills 41:19 it's a bit of a funny one. A skill is is can be two very different things. a skill can be an explicit command that you invoke explicitly as part of 41:32 a prompt. and in fact Claude Code used to have a dedicated feature called commands which is superseded by skills. If you want to do 41:44 that, you then take the skill name and you invoke it as a slash command. That's , that's one invocation mode. Another one is for the 41:56 model to auto decide that a skill applies to a particular prompt. And for that one, only the description matters. it's then implicitly used 42:09 rather than explicitly. And , for that reason, if he if you want to use it implicitly, it's extremely important that you look at what the description in your skills file says because if that 42:21 one is not specific enough or it it the skill may be in your project, but it may never be invoked because the 42:33 LLM may not find it relevant, ? , what do we have here? Guides, annotation, documentation, probably I semantic use when adding or updating. 42:43 this could not be more explicit. I'm happy with that. This is definitely something which it should pick up. , cool. let's run our test then, ? , the 42:55 the test we did last time round was we've always given it the exact same prompt that we can more easily compare. 43:05 I love that. we're here . Testing 101. [laughter] Absolutely. , I'm going to do a new chat. 43:15 I'm going Do you need to keep that file before you run the new item? , this file exists not in the get, but it's it says there's a keep item there. Is that 43:26 something we have to do, Matthias? no, this is you don't. this is 43:36 this is an option for you to undo what your agent has changed or produced, ? , if you it's it's a way for 43:46 you to easily revert what the agent has done, but as far as this project is concerned, this file exists in my project on my hard disk 43:56 irrespective of whether or not I explicitly keep it. But, there we go. If I if I do click to do that. It just tells you it's new. But, but it it's it can be a bit 44:06 annoying, , when it Why is it all green? people are going to ask that. Exactly. , cool. Exactly. All . I'm just going to add these to 44:17 make it very clear that we're talking about issues and also I'm going to commit that. 44:27 And the hashtags you're adding there are callouts. , this agent is aware of the Git hub. again, calling calling it back out. This is the GitHub MCP 44:37 server. , by adding the hashtags the way you did, it's notifying the agent, "Hey, these are existing issues." And this is This is maybe a technique I've learned from you, Matias, which is 44:47 issues are probably more important here when you're working with agents because once those issues exist, you could have many work trees working out the same issue in different work trees at the 44:57 same time and you can select from the solution a a library or a group of different ways of solving the same problem. , this that's a really good technique there to make issues and then 45:07 call them out in your prompts directly cuz we don't have to write up what the issue number four or three was. It's already there captured in GitHub. Exactly. And by the way, , you don't even have to leave VS Code. I've got my GitHub extension in here. Sure. 45:20 If I click on it, I've got access to all pull requests and issues from that exact same repository. , if I go to recent issues, I can see down here this is 45:30 number three and this is number four. If I click on it, it opens it very neatly inside VS Code. , I don't have to switch back and forth. I don't have to go out of this experience. 45:42 In fact, I can even go in here and add a new comment from inside VS Code. if people didn't know that, definitely something to be aware of. All . , let's see how we're 45:53 doing. let's send this off and hope for the best. this may take a little longer. 46:06 I can briefly introduce the the fourth one. We're probably not going to be able to fully implement that one today just 46:16 in terms of time, but Sure. just to have started with the 46:29 introduction. in here as you can see 46:45 . , in here I've got the agent selection menu, which allows you to do two different things. You can You can 46:57 change the agent mode when using the default built-in agent or you can select a custom agent to 47:09 use for this particular session. And custom agents you can make yourself very easily. It's yet another markdown file. , it's it's just about writing 47:21 prose, if you will. Yep. , or since we're in the VS Code experience here, VS Code extensions can the 47:31 provide those custom agents as . in fact, they provide a lot of transparency here because I just told you, ? Just a skill, and a 47:42 custom agent is just a markdown file with instructions. Yes. , when you have custom agents provided either through your 47:54 project files or through extensions, you have over them, you can see what the description is. Look at that. , 48:05 nothing's hidden, , it's all open source by definition, and we can see this one lives 48:16 inside my home directory in the extensions folder in the Fabric VS Code extension. , this is 48:28 shipped as part of the VS Code extension, and we can see it's got a very similar structure and format. We've got what's called front matter. 48:38 this is a a a pre a section that contains key-value 48:48 key-value dictionary, description, model tools, and then we've got agent instructions underneath. And , this one 49:00 says, "This agent helps Microsoft Fabric users interact with the agent platform." for some reason, it 49:10 they decided to put a model specifier in here, which is obsolete. , hello Fabric team, please fix that. As you can [laughter] see, because Opus 4.5 is no longer available in Copilot. 49:22 which is also why this one says, "Unknown model will be ignored." you know, if you're going to put a model specifier in here, you got to make sure it it it won't break. 49:34 thankfully, VS Code is happy to just ignore it rather than fail on this. and then the agent can also specify that only a 49:46 subset of available tools should be available to it. , that tool scoping is is very good. and when I then go in here and select any 50:00 of these, what happens is that all those instructions are then implicitly added to my context. , 50:10 this is that's why this is the fourth way for me to silently add a additional detail into my 50:20 conversation without having to make it part of my prompt, ? hopefully people take away from this that ultimate at the end of the 50:30 day, it's always just coming down to writing more English, ? and . That's That's our programming language that we're going to program for. We just speak to it and it works. 50:40 And the the the most basic way of doing that is to just provide a longer prompt, but no one's going to want that. 50:50 that's not repeatable, that is laborious, that will annoy you, ? You want to keep your prompts as short as possible. , having a custom 51:00 agent, having custom skill, or having repository scoped instructions are neat way for you to get around that. . And , why don't we see how this guy is 51:11 doing? it's created a to-do list. quite detailed. Seven, and it's only done two far. Yeah, , my 51:21 personal experience, , we were talking about which harness are we using lately? My personal experience with GitHub Copilot is that 51:31 it's certainly degraded in terms of performance. , it it does perform very , it just takes much longer than it used to. which is a clear indicator that they may 51:42 have run into some capacity issues. Struggling or capacity issues, yeah. I think everyone is struggling with this. again, the some of the stories I've been reading through the the 51:54 finance marketing world is that many of these companies or organizations are finding great adoption going from 30% adoption of users or developers using agents to very quickly 40, 50, 60% 52:06 adoption on developers wanting and needing to use this as part of their daily workflow. , as this is rapidly taking an effect, we don't have the infrastructure in place. It's just not there yet. , some very interesting 52:18 developments on the hardware side. I think honestly, if you're if you're looking for investments on things, go look at the hardware side of things. Who's who's building the chips? Where are the facilities getting made? Who's creating these buildings? there's a 52:29 lot of need there to to continue to ramp up those facilities we can have the AI models and that we the way we want to run them. [snorts] Exciting. , excellent. we probably 52:40 don't have to wait all the way till it's done seven out of seven because we already have interim results here, which we can inspect, ? And the other thing we can do is What's already 52:51 been explored? check whether it's read any of our skills. 53:02 this is my own personal one, that's not the one Here there we go. Look at that. , look at , this is This is the prompt, ? And then at 53:12 the very first thing it that happened was that three files were read. And one of them happens to be Look at 53:22 that. There we go. Semantic model documentation skill.md, which is the very one we just created ourselves, ? , it that shows the system 53:33 is working, , it's it didn't take the harness very long at all to determine that that one was relevant and that should be read and we can be 53:44 rest assured that all those additional instructions are part of the of this current iteration. 53:54 Also turns out that it's that it's just done seven out of seven. here we go. All . this is the Timed Export. that's the one commit 54:06 we provided ourselves and then two further commits were added as I asked. No, sorry. This is This is the one I made and then here we go. this this 54:18 one has done one commit for everything rather than as I asked one for each issue. Yeah. 54:28 Sad, but there we go. But we didn't explicitly tell it to in the instructions. Did we tell it to in the instructions on this one or is that not when What if we didn't really say Yeah, 54:40 you're . Yeah, you're . We didn't quite give it the guidance on that separately, but that's . That a a senior engineer would probably have done that intuitively because it's good 54:50 practice, but you're . but what really matters is what has it done in terms of creating descriptions. I'm really really 55:00 excited . Let's see if we're going to be Oh, look at this. if we're going to be disappointed. , we have number of 55:10 unique customers who made at least one purchase in the selected period. Use this to track active buyer engagement within a given time frame. I'm happy with that. number of units sold in 55:20 the selected period. Use this , there we go. I would say just having skimmed over it There's a couple measure If you go up a little bit, there was one here. Measure sales amount. 55:32 Yeah. Skip the description on that one. Yeah, that's true. But it got a lot more descriptions on other measures here as . Yeah. this one seems to have 55:42 had a description already, ? Cuz obviously that's not an addition. Yeah, correct. . it's done it's done the 55:52 thing semantically, , in terms of huge huge advantage compared to our very first 56:02 experiment. even though let me just remind everyone we used the exact same prompt, ? Only this time what was different was that we had the skill 56:13 in place. . that's great. I would have preferred for it to put in some line breaks, , this is very hard to read for me. . 56:25 . But obviously it would have needed some context around that that's even supported and that's that that's a possibility in terminal, it clearly 56:36 didn't have that. Yeah. but this would again, ? For everyone out there, you you don't you generally don't put your skills 56:46 into your project once and then you leave them there. , skills need to evolve with your project as much as everything else and when you come 56:56 across things that, you just ask it to up you you don't have to go and manually update the skill file. You just ask it to update, . 57:06 And display folders we've got, very nice. It even understands, you know, that you you can enclose 57:17 things in quotes when you've got special characters and spaces in there. yeah, very happy with that, . I think this is a 57:28 maybe a nice point to wrap up Given that we arguably used [laughter] a bit more time than But I think 57:38 this clearly explains I think we've gotten to the point of being able to make a lot of different ways of interacting with the agent and different methods of, you know, making better prompts. 57:48 When you get down to that list of items that you're using where it gets more we're building custom agents and we're building skills. These are reusable items, reusable artifacts, which is again, if you One of the challenges I 57:58 face here with a lot of this agent world of things is people just build things and they create stuff and we we go back and do the same prompt again. That that's useless to 58:08 me. That's not helpful for a workflow. I don't want to go through and do three or four prompts over and over again. Everything we should be doing when it's anyway repeatable, 58:18 figure out how to make it a a skill. Figure out how to make a custom agent for this, ? Custom agent, document my model agent, ? That that would be a great opportunity to bring that in you can give it context to these other 58:30 repeatable tasks that we're going to use over and over again. And to your point, I'm finding as I do this again on a second or a third model or other changes that I'm making for things, I'll bring 58:40 the skill with me. And I'm asking the agent to say, ", reflect. As I worked on this project with you, did I give you some suggestions? Did I update some things? Go back and update the skill based on 58:50 what we learned." And and you can you want these skills to be an evolution of a task, an organism that grows and gets better over time with you as you use them. 59:00 Excellent. And also, , just to point out that there's there are huge marketplaces out there of skills made by others 59:11 and shared. , as we've seen you marketplace is a bit of a misnomer because by definition a skill file is 59:22 always going to be open source. , there isn't really a an economic model here at all, , by definition the the skill files at least will always be open source. 59:34 but , that's what I would propose we explore and , in in some future sessions. what marketplaces are out there, which 59:44 ones are relevant and how do you bring them into your project? One that I'll call out is one called awesome co-pilot. , awesome co-pilot is one that's out there made by the Microsoft 59:55 team. I use that one and look it for skills on there fairly frequently. , that's one of many, but we'll that would probably be another great topic for going through. Let's go find some marketplaces, pull in some skills, and how can we enrich our agent by plugging 1:00:06 in real tools that we need to use to make things more efficient. Awesome. Matthias, thank you much for doing the demo today. This was a lot of fun. I'm loving learning these things because this is a a good way of digesting new 1:00:17 content that's out there. Things are changing fast. Every week something new is coming out. We need to talk about all these things. , unpacking these in a regular cadence is really I think the best way to do this. We want to 1:00:28 encourage you to go build this stuff on your own. The reason we're doing screen shares, the reason we're typing these prompts in directly that you can go home and go check this out and build it on your own. That's the power of this. 1:00:38 Go start building in your own workload and your own week. I'm going to challenge you to take some of these lessons and go implement them. See how it works. Let us know in the description if there's any other content 1:00:48 or things you'd to learn about. Matthias, thank you much. Go follow us on on the social medias and we'll see you next time. Absolutely. Look forward to it. Bye. 1:01:03 Agent A I thinking. [music]