TIL how to set up a local LLM as an assistant in VSCode
⋅ 3 minute read
I am still trying to get used to the fact that the product of state-of-the-art AI companies is essentially a GB-sized CSV file and some instructions about the network architecture to perform the inference. Recently released “open source” models of medium-size are showing quite decent performance on benchmarks , so I was interested how easy it would be to set them up locally to help me with coding tasks in VSCode. Short answer: very easy.
The bad news first. Those local small-to-medium sized models won’t reach the performance of remotely hosted GPT models. I also think the local inference will likely influence battery life of your machine, but there are also some benefits of local LLMs:
- free, local inference from “open-source” models (no company lock-in for inference)
- works without internet access which can also substitute a quick google search if one is in a place with bad reception
- no sharing of sensitive information with 3rd parties
- customizability. You can pick a model fine-tuned for your use specific use-case
Hardware requirements
I am using a 2018 MacBook Pro with MacOS and VSCode. For 7B parameter models you need 8GB+ of RAM, for 13B parameter models you need 16GB+ of RAM. Smaller models also return the response quicker. So far a 7B parameter model gives me a response within 3-5s.
Downloading local models
To manage models and handle the inference interface, we use Ollama as our model provider:
brew install ollama
Then, we use the ollama pull [model_name]
command to download one of the models from Ollama’s model
library
. I am using Meta’s open source Unfortunately, in 2023 open source in the context of LLMs for most people with small budgets means: open to use, not open to reproduce.Llama2 model fine-tuned for coding tasks and a “moderate size” of 7b parameters. To get the familiar chat-like experience, use an instruct model that was trained to output human-like answers to questions. More details about the different model variations and the intended use cases can be found
here
.
In the CLI run: ollama pull codellama:7b-instruct
This will download the 3.8GB model from the internet and save the model parameters to your hard drive.
Prompting the model from CLI
After downloading the model, we can prompt it from the CLI with
ollama run codellama:7b-instruct
and then type in our prompt.
Using the model as a coding assistant inside VS Code
To prompt the model from within VSCode I am using the extension Continue
. By default Continue
uses GPT-4 in trial mode. However, you can point Continue
to any local or remote model by specifying local paths, API keys, etc. in the config.json
. In our case we want to point Continue
to our codellama model, so we add the following to the file:
1// config.json
2"models": [
3 {
4 "title": "GPT-4",
5 "provider": "openai-free-trial",
6 "model": "gpt-4"
7 },
8 {
9 "title": "GPT-3.5-Turbo",
10 "provider": "openai-free-trial",
11 "model": "gpt-3.5-turbo"
12 },
13// add the local model here
14 {
15 "title": "codellama:7b-instruct",
16 "model": "codellama:7b-instruct",
17 "contextLength": 4096,
18 "provider": "ollama"
19 }
20 ],
The nice thing about Continue
is that it allows you to easily jump between local and remote models. This is useful if you are unhappy with a response and want to try a different model.
If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.