Have to say, this feels like Web 2.0 all over again (in a good way) :)
When having APIs and machine consumable tools looked cool and all that stuff…
I can’t see why people are looking this as a bad thing — isn’t it wonderful that the AI/LLM/Agents/WhateverYouCallThem has made websites and platforms to open up and allow programatical access to their services (as a side effect)?
I can't believe everyone is talking about MCP vs CLI and which is superior; both are a method of tool calling, it does not matter which format the LLM uses for tool calling as long as it provides the same capabilities. CLIs might be marginably better (LLMs might have been trained on common CLIs), but MCPs have their uses (complex auth, connecting users to data sources) and in my experience if you're using any of the frontier models, it doesn't really matter which tool calling format you're using; a bespoke format also works.
The difference that should be talked about, should be how skills allow much more efficient context management. Skills are frequently connected to CLI usage, but I don't see any reason why. For example, Amp allows skills to attach MCP servers to them – the MCP server is automatically launched when the Agent loads that skill[0]. I belive that both for MCP servers and CLIs, having them in skills is the way for efficent context, and hoping that other agents also adopt this same feature.
That's fine if you definition of capabilities is wide enough to include model understanding of the provided tool and token waste in the model trying to understand the tool and token waste in the model doing things ass backwards and inflating the context because it can't see the vastly shorter path to the solution provided by the tool and...
There is plenty of evidence to suggest that performance, success rates, and efficiency, are all impacted quite drastically by the particular combination of tool and model.
This is evidenced by the end of your paragraph in which you admit that you are focused only on a couple (or perhaps a few) models. But even then, throw them a tool they don't understand that has the same capabilities as a tool they do understand and you're going to burn a bunch of tokens watching it try to figure the tool out.
> model understanding of the provided tool and token waste in the model trying to understand the tool and token waste in the model doing things ass backwards and inflating the context because it can't see the vastly shorter path to the solution provided by the tool and...
> But even then, throw them a tool they don't understand that has the same capabilities as a tool they do understand and you're going to burn a bunch of tokens watching it try to figure the tool out.
What I was trying to say was that this applies to both MCPs and CLIs – obviously, if you have a certain CLI tool that's represented thoroughly through the model's training dataset (i.e. grep, gh, sed, and so on), it's definitely beneficial to use CLIs (since it means less context spending, less trial-and-error to get the expected results, and so on).
However if you have a novel thing that you want to connect to LLM-based Agents, i.e. a reverse enginnering tool, or a browser debugging protocol adapter, or your next big thing(tm), it might not really matter if you have a CLI or a MCP since LLMs are both post-trained (hence proficent) for both, and you'll have to do the trial-and-error thing anyway (since neither would represented in the training dataset).
I would say that the MCP hype is dying out so I personally won't build a new product with MCP right now, but no need to ditch MCPs for any reason, nor do I see anything inherently deficient in the MCP protocol itself. It's just another tool-calling solution.
> the MCP server is automatically launched when the Agent loads that skill
The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window. Input tokens are the main driver of inference costs and a lot of use cases aren't economical without prompt caching.
Hopefully in future LLMs are trained so you can add tool definitions anywhere in the context window. Lots of use cases benefit from this, e.g. in ecommerce there's really no point providing a "clear cart" tool to the LLM upfront, it'd be nice if you could dynamically provide it after item(s) are first added.
> The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window.
TBH I'm not really sure how it works in Amp (I never actually inspected how it alters the prompts that are sent to Anthropic), but does it really matter for the LLMs to have the tool definitions at the beginning of the context window in contrast to the bottom before my next new prompt?
I mean, skills also work the same way, right? (it gets appended at the bottom, when the LLM triggers the skill) Why not MCP tooling definitions? (They're basically the same thing, no?)
No, it really matters because of the impact it has on context tokens. Reading on GH issue with MCP burns 54k tokens just to load the spec. If you use several MCPs it adds up really fast.
The impact on context tokens would be more of a 'you're holding it wrong' problem, no?
The GH MCP burning tokens is an issue on the GH MCP server, not the protocol itself. (I would say that since the gh CLI would be strongly represented in the training dataset, it would be more beneficial to just use the CLI in this case though.)
I do think that we should adopt Amp's MCPs-on-skills model that I've mentioned in my original comment more (hence allowing on-demand context management).
MCP specs are verbose json objects and they have to go into the context before you can call them. So yes it is an issue with the fundamental design of the protocol.
Even if the model doesn’t already know the cli commands it can interrogate them at a much lower token cost for just the commands needed.
Verbosity of the output seems orthogonal to the cli vs mcp distinction? When I made mcp tools and noticed a lot of tokens being used, I changed the default to output less and added options to expose different kinds of detailed info depending what the model wants. CLI can support similar behavior.
MCP needs to be supported during the training and trained into the LLM whereas using CLI is very common in the training set already. Since MCP does not really provide any significant benefits I think good CLI tools and its use by LLMs should be the way forward.
This is very developer centric. While Github might have good CLI, there's absolutely no point in having most services develop CLIs and have their non-technical users install those. Not only is it bad UX, but it's bad from security perspective as well. This is like arguing that Github shouldn't have GraphQL/Rest api since everyone should use the CLI.
Yeah, I've gotta use skills more. I didn't quite get it until this last week when I used a skill that I made. I didn't know the skill got pulled into context ONLY for the single command being ran with the skill, I thought the skill got pulled into context and stayed there once it was called.
That does seem very powerful now that I've had some time to think about it.
tldr; they wanted to run a Tauri app in browser for dev purposes.
To do so, they shimmed the Tauri’s rust communication bridge to use web-socket to communicate with the main app’s rust implementation.
This is only used by dev, but if something like this is provided by Tauri/Electron it can probably enable a bunch of interesting use cases… (and probably a bunch of RCEs as well, though)
TBH I am sad that Anthropic is changing its stance, but in the current world, if you even care about LLM safety, I feel that this is the right choice — there’s too many model providers and they probably don’t consider safety as high priority as Anthropic. (Yes that might change, they can get pressurized by the govt, yada yada, but they literally created their own company because of AI safety, I do think they actually care for now)
If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil), and that might mean releasing models that are safer and more steerable than others (even if, unfortunately, they are not 100% up to Anthropic’s goals)
Dogmatism, while great, has its time and place, and with a thousand bad actors in the LLM space, pragmatism wins better.
I genuinly curious why they are so holy to you, when to me I see just another tech company trying to make cash
Edit: Reading some of the linked articles, I can see how Anthropic CEO is refusing to allow their product for warfare (killing humans), which is probably a good thing that resonates with supporting them
How is it a good thing to refuse to provide our warfighters with the tools that they need? I mean if we're going to have a military at all then we owe it to them to give them the best possible weapons systems that minimize friendly casualties. And let's not have any specious claims that LLMs are somehow special or uniquely dangerous: the US military has deployed operational fully autonomous weapons systems since the 1970s.
This is the US military we’re talking about so 95% of what they do is attacking people for oil. They don’t “need” more of anything, they’re funded to the tune of a trillion dollars a year, almost as much as every other military in the world combined. What holy mission do you think they’re going to carry out with the assistance of LLMs?
That's a total non sequitur. If you think the military is being tasked with the wrong missions, or too many missions, then take that up with the civilian political leadership. But it's not a valid reason to deny the warfighters the best possible weapons systems.
Personally I favor a less interventionist foreign policy. But that change can only come about through the political process, not by unaccountable corporate employees making arbitrary decisions about how certain products can be used.
> But it's not a valid reason to deny the warfighters the best possible weapons systems.
Of course it is.
Think about it this way: if you could guarantee that the military suffers no human losses when attacking a foreign country, do you think that's going to more or less foreign interventions?
The tools available to the military influence policy, these things are linked.
US military is already overwhelmingly powerful, there's 0 reason to make it even more powerful.
That's so delusional. The US military is currently preparing for a potential conflict with China to stop an invasion with Taiwan. They don't have anything near "overwhelming force" for that mission: recent simulations put it about even at best. People who believe they don't need any improved autonomous weapons are simply uninformed.
Don't presume to put words in my mouth. I flagged your comment for lying about my claims.
Individual Americans aren't slaves. They can do as they please and are under no obligation to help build weapons for warfighters. But I think it's ridiculous and offensive for a US corporation to presume to take on a role as moral arbiters by placing arbitrary limits on US government use of certain products. There are larger issues here that need to be addressed through the political process, not through commercial software license agreements.
Sure, it wasnt fair for me to claim you said that, so I apologize. It was rude of me to frame my position in that manner, and wasnt intended maliciously.
I meant to suggest that corps being unable to take those positions results in such a world for Americans at those corps
> I think it's ridiculous and offensive for a US corporation to presume to take on a role as moral arbiters
A corporation is just a group of people. Anthropic isn't even public, and therefore it's directors aren't subject to any sort of fiduciary duty enshrined in law. They can collectively act as they wish.
> If you think the military is being tasked with the wrong missions, or too many missions, then take that up with the civilian political leadership. But it's not a valid reason to deny the warfighters the best possible weapons systems.
It is an ethical dilemma: believing an armed force will act unethically is in fact a valid reason to refuse to arm them. You are taking a nationalistic view regarding the worth of life.
And if you believe it is unethical to arm them, it is rational to use whatever leverage you have available to you - such as refusing to sell your company's product.
Furthermore, one of the two points at issue was regarding surveiling civilians.
Why are you asking this question? You know what the answer is, you've just arbitrarily decided that it's specious in an attempt to frame rebuttals as unreasonable.
1. You don't believe in the mission or direction of US warfighters
2. Supporting warfighters is developmentally distinct from what you want your corporate competences and direction are.
3. you don't want military to be more safe an capable.
> If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil)
I don't think it's going to be as easy to tell as you think that they might be becoming evil before it's too late if this doesn't seem to raise any alarm bells to you that this is already their plan
The world would be so much nicer if there were just fewer pragmatists shitting up the place for everyone. We might actually handle half our externalities.
I feel like declarative container-like dev environments (e.g. nix shell or guix shell, and so on) will become much more popular in the following years with the rise of LLM agentic tools. It seems that the aformentioned tools provide much more value when they can get full access to the dev environment.
Sprites[0], exe.dev[1], and more services seem to be focusing on providing instant VMs for these use cases, but for me it seems like it's a waste for users to have to ssh into a separate cloud server (and feel the latency) just to get a clean dev environment. I feel that a similar tool where you can get a clean slate dev environment from a declarative description locally, without all of the overhead and the weight of Docker or VMs would be very welcomed.
(Note: I am not trying to inject AI-hype on a Guix-related post, I do realize that the audience of LLM tools and Guix would be quite different, this is just an observation)
As a Guix lover and LLM tooling enthusiast, I complete agree. Administrating my system via Claude Code is so much easier. LLMs work better on a system that's hackable via text.
This is very interesting, I haven’t touched macOS development for quite a while but it’s good to know that libraries are still being written for both AppKit and SwiftUI on macOS.
I do feel that this library would benefit from an explanation on why this was needed. AFAIR AppKit already provides a native tabbing API where you can “just” (that “just” is doing a lot of heavy lifting) implement a few delegate methods and you get tabbing behavior for free, especially on document-based apps. (Sorry, I do not remember the specifics, it might have been a tad more difficult)
I’m not updated on the SwiftUI equivalent, but I would imagine that a similar API would exist much alike API for multiple windows or multiple documents.
I think everyone would benefit from a “why” explanation (which I definitely think would exist, since I’ve used too many AppKit APIs in pain), and also some screenshots for a demo app (so that we can expect how it would look and how much the look and feel would deviate from the native counterparts).
I've tried the native tab support several times, and my impression is that it's good for very little.
It may be OK for certain types of document-oriented apps, but there's a reason most apps (Chrome, iTerm, even Safari uses its own native tabs, I believe) don't use it. It's underbaked and awkward to fit into a model where your "tab data model" doesn't neatly fit the document data model that the framework wants.
I recently made an app where I wanted tabs, and I just ended up abandoning tab support for this reason, and adding a todo item to use an off-the-shelf tab UI library in the future.
Yeah I realized that only now, for some reason when I was on mobile and I was looking into this the demo video was not loading at all. I would love to retract my comment :(
I haven't even realized that while I was reading the article, but it is amusing!
Though one explanation is that I think for the other stuff that the writer doesn't explain, one can just guess and be half right, and even if the reader guesses wrong, isn't critical to the bug — but sockets and capabilities are the concepts that are required to understand the post.
It still is amusing and I wouldn't have even realized that until you pointed that out.
I'm genuinely curious on how well this is working, is there an independent Java test suite that covers major Java 5/6 features that can verify that the JOPA compiler works per the spec? I.e. I see that Claude has wrote a few tests in it's commits, but it would be wonderful if there's a non-Clauded independent test suite (probably from other Java implementations?) that tracks progress.
I do feel that that is pretty much needed to claim that Claude is adding features to match the Java spec.
Well, it's complicated. The original jdk compliance tests are notoriously hard to deal with. Currently I parse nearly 100% of positive testcases from JDK 7 test suite (in one of Java 7 branches) but I only have several dozens of true end to end tests (build .java with jopa, validate classfile with javap, run classfile with javac).
So, I can't tell how good it actually is but it definitely handles reasonably complex source files with generics (something the original compiler was unable to do).
The actual goal of the project is to be able to build at least ANT to simplify clean bootstrap of OpenJDK.
Like when I ask AIs to port sed to java, and it writes test cases ... running sed on a CLI and doesn't implement the full lang spec no matter how much prompting I give it.
I think the criticisms are too often dismissed as moving the goalposts or ignorant of potential, but short of recreating the active open bugs in Java, you've created a different thing whose differences have to be managed and it is unclear how helpful that may be despite the working implementations of subsets.
If I (or someone else) can use it as a start point in bootstrap process - that's fine with me. This is not supposed to be a top-tier compiler. Essentially, it needs to be able to build ANT.
It is beyond annoying that the article is totally generated by AI. I appreciate the author (hopefully) spending effort in trying to figure out the AI systems, but the obviously-LLM non-edited content makes me not trust the article.
What makes you believe that anything in the article is real?
The author seems to not exist and it's unclear where the data underlying the claims is even coming from since you can't just go and capture network traffic wherever you like.
I knew for a fact that a Linux desktop was a viable option when you have a separate macOS/Windows laptop (which is my main computer). Recently (frustrated with macOS updates), I decided to be Linux-only for a week[0], replacing my MBP with an MBA that runs Asahi Linux.
Unfortunately it turns out that I depend on too many desktop apps that runs on the major desktop OSes but not on Linux (or on Wine, for that matter).
* KakaoTalk, the major South Korean IM app ran on Wine for a week, but the updater doesn't work and freshly reinstalling the app broke Wine for some reason. (I tried removing the whole ~/.wine prefix, but it doesn't work.) Now I'm stuck without KakaoTalk.
* Discord is only provided as a x86_64 Deb file and a .tar.gz file. I tried using it from Firefox, and it works fine but audio sharing during screen sharing doesn't work.
* Disconnecting from my Bluetooth AirPods somehow does not stop my music. I'm not sure if this is an AirPods limitation or a Linux limitation (since I've never used AirPods with Windows), but it annoyed me endlessly.
* USB-C DP mode and the fingerprint sensor doesn't work. This is an Asahi Linux limitation, but I've seen various parts of the hardware not working when using other Linux distributions on laptops as well. I feel this is a common occurrence.
Not to mention that the lack of text editing shortcuts that macOS has, which is a big deal to me (but I tried as that is a macOS-ism).
I carried my MBA for 4 days before I gave up today. I brought my MBP today with me.
> * Disconnecting from my Bluetooth AirPods somehow does not stop my music. I'm not sure if this is an AirPods limitation or a Linux limitation (since I've never used AirPods with Windows), but it annoyed me endlessly.
I think this is by design, not limitation. On android, changing sound device stops music playback. On windows and linux, changing sound device doesn't stop sound. I tried it with wired headphones, maybe expectations for BT are different, but I think that comes from smartphones.
>* USB-C DP mode and the fingerprint sensor doesn't work. This is an Asahi Linux limitation, but I've seen various parts of the hardware not working when using other Linux distributions on laptops as well. I feel this is a common occurrence.
This really is a special case, they've had to write new drivers for everything in the Apple Silicon Macs and they haven't gotten that working yet. I have in fact been waiting on this feature for a few years now as I want to use a MBP with the lid closed and two monitors plugged in, but currently only the HDMI works and not most USB-C functionality. This is not at all the norm in x86_64 land where more normal hardware is used. I'm still using a ThinkPad T440p and thinking about getting a T14 gen 5 due to the MBP I got a few years ago not being satisfying/fun to use, comparatively.
As for Discord and AirPods and such, the more proprietary stuff you need, the worse time you'll have. Though I just saw something in the news that might help with the AirPods. Check out LibrePods.
For discord I just use Discord Canary. its a wrapper and works perfectly. But I'm also on Fedora.
I would suggest trying something other then Asahi linux! I know that their support with Mac systems is near unbeatable. But it does still tend to have some hiccups. Especially with M3+ systems.
I know that "try a different distro" is a often (user biased) and imo bad answer. But in the case of Asahi as awesome as their work is they are climbing a different mountain compared to the rest of linux development.
>Discord is only provided as a x86_64 Deb file and a .tar.gz file. I tried using it from Firefox, and it works fine but audio sharing during screen sharing doesn't work.
I got it working with the unofficial client Vesktop. Functioning screensharing on wayland is actually advertised as one of their main features.
When having APIs and machine consumable tools looked cool and all that stuff…
I can’t see why people are looking this as a bad thing — isn’t it wonderful that the AI/LLM/Agents/WhateverYouCallThem has made websites and platforms to open up and allow programatical access to their services (as a side effect)?
reply