TIL how to set up a local LLM as an assistant in VSCode

2024-01-10 ⋅ 3 minute read

I am still trying to get used to the fact that the product of state-of-the-art AI companies is essentially a GB-sized CSV file and some instructions about the network architecture to perform the inference. Recently released “open source” models of medium-size are showing quite decent performance on benchmarks , so I was interested how easy it would be to set them up locally to help me with coding tasks in VSCode. Short answer: very easy.

The bad news first. Those local small-to-medium sized models won’t reach the performance of remotely hosted GPT models. I also think the local inference will likely influence battery life of your machine, but there are also some benefits of local LLMs:

free, local inference from “open-source” models (no company lock-in for inference)
works without internet access which can also substitute a quick google search if one is in a place with bad reception
no sharing of sensitive information with 3rd parties
customizability. You can pick a model fine-tuned for your use specific use-case

Running local LLM on plane — Figure 1. Using codellama for work on a low-cost carrier flight without internet access.

Hardware requirements

I am using a 2018 MacBook Pro with MacOS and VSCode. For 7B parameter models you need 8GB+ of RAM, for 13B parameter models you need 16GB+ of RAM. Smaller models also return the response quicker. So far a 7B parameter model gives me a response within 3-5s.

Downloading local models

To manage models and handle the inference interface, we use Ollama as our model provider:

brew install ollama

Then, we use the ollama pull [model_name] command to download one of the models from Ollama’s model library . I am using Meta’s open source Unfortunately, in 2023 open source in the context of LLMs for most people with small budgets means: open to use, not open to reproduce.Llama2 model fine-tuned for coding tasks and a “moderate size” of 7b parameters. To get the familiar chat-like experience, use an instruct model that was trained to output human-like answers to questions. More details about the different model variations and the intended use cases can be found here .

In the CLI run: ollama pull codellama:7b-instruct This will download the 3.8GB model from the internet and save the model parameters to your hard drive.

Prompting the model from CLI

After downloading the model, we can prompt it from the CLI with ollama run codellama:7b-instruct and then type in our prompt.

Using the model as a coding assistant inside VS Code

To prompt the model from within VSCode I am using the extension Continue. By default Continue uses GPT-4 in trial mode. However, you can point Continue to any local or remote model by specifying local paths, API keys, etc. in the config.json. In our case we want to point Continue to our codellama model, so we add the following to the file:

// config.json
"models": [
    {
      "title": "GPT-4",
      "provider": "openai-free-trial",
      "model": "gpt-4"
    },
    {
      "title": "GPT-3.5-Turbo",
      "provider": "openai-free-trial",
      "model": "gpt-3.5-turbo"
    },
// add the local model here
    {
      "title": "codellama:7b-instruct",
      "model": "codellama:7b-instruct",
      "contextLength": 4096,
      "provider": "ollama"
    }
  ],

The nice thing about Continue is that it allows you to easily jump between local and remote models. This is useful if you are unhappy with a response and want to try a different model.

If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.

Tags:
#llm #llama