LLMs running locally (2025/08/30)

LLMs running locally

2025-08-30 22:28:42 +01:00 by Mark Smith

I finally got working the thing I have been configuring the past few days: a collection of LLMs of various different sizes, running on my local machine, all accessible via terminal and through a chat interface in VSCode. I fell down all sorts of strange rabbit holes, but it‘s working. I now have a much better appreciation of how all these AI models work, but still lots to learn.

This was the first time I've worked with Claude. I found Claude to be very efficient, most of the time, refreshingly concise, and dare I say it, surprisingly honest about the current state of AI tooling. There are still some strange things though.

When I asked each of these newly configured models separately what their name was, they all initially insisted that they were an AI language model created by OpenAI called GPT-4. Of course since I downloaded, installed and configured them, I know that none of them are in fact GPT-4 created by OpenAI. Interestingly when I disconnected from the internet, they all started insisting that they were Claude made by Anthropic. None of the models are actually Claude made by Anthropic.

The VSCode extension that I was using to connect to the models, has a website where they have a chat interface, and I was chatting quite a bit with that AI, who also claimed to be Claude.

So yes, it’s working, but I‘m not exactly super confident it‘s working particularly well. One of the models was initially very sure that 2 + 2 = 3. Based on the conversations I had with Claude, it’s quite apparent that most developers that are using AI aren‘t exactly overly worried about security. It seemed like I was in the minority to try and configure things to run in devcontainers.

The big thing here is that I wanted to be setup so that if needed I could work with AIs on coding projects without needing to upload any code to 3rd party servers. That‘s the downside of cloud based AIs, they are very quick and sometimes quite clever, but they require that you upload all your code to their servers, and I know that that isn't always an option for some folks.

I’m hoping to work on some coding features over the next few days. I‘m very interested to see how they perform compared to Gemini. #