Introduction

I’ve been tinkering with using a locally hosted code-capable LLM to speed up my development workflow—everything from autocompleting boilerplate to generating documentation and even suggesting refactors. Instead of relying on cloud-based code assistants (which can be pricey for heavy use), I’m running a small model locally. Here’s my pipeline to integrate it into my IDE or CI pipeline.

Why I Went Local for Code Generation

  • Security: No proprietary code ever leaves my machine or CI servers.
  • Cost Savings: I call the model as often as I like without per-token billing.
  • Customization: I can fine-tune or prompt-engineer the model to follow my team’s style guides.

Pipeline Overview

  1. Choose or Fine-Tune a Code Model – Grab a code-specialized LLM (e.g., CodeGen, StarCoder) and optionally fine-tune on your codebase.
  2. Setup a Local API Service – Wrap the model in a simple REST or gRPC server.
  3. IDE Integration – Connect VSCode or JetBrains via plugin or LSP.
  4. CI Integration – Use the service in pre-commit hooks or CI jobs to enforce docs coverage or style.

I’ll break down each step with config snippets and handy tips.


1. Choosing & Fine-Tuning the Code Model

I started with starcoder since it’s open and supports Python, JavaScript, and more. If you want to fine-tune on your code, I recommend using LoRA to keep the process lightweight.

pip install transformers peft accelerate datasets
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

base = "bigcode/starcoder"
tokenizer = AutoTokenizer.from_pretrained(base)
model     = AutoModelForCausalLM.from_pretrained(base)

# LoRA setup
peft_config = LoraConfig(
    task_type="CAUSAL_LM",
    r=4, alpha=16, dropout=0.1
)
model = get_peft_model(model, peft_config)

# Load your code dataset
ds = load_dataset('path/to/your/code', split='train')

def tokenize_fn(examples):
    return tokenizer(examples['code'], truncation=True, padding=True)
ts = ds.map(tokenize_fn, batched=True)

args = TrainingArguments(
    output_dir='starcoder-finetuned',
    per_device_train_batch_size=2,
    num_train_epochs=2,
    logging_steps=100
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=ts
)
trainer.train()

model.save_pretrained('starcoder-finetuned')

Now I have a model that’s read my entire codebase’s patterns and naming.


2. Setting Up a Local API Service

I wrap the model in a FastAPI server so IDEs and CI can hit it via HTTP.

from fastapi import FastAPI, Body
from pydantic import BaseModel
from transformers import pipeline

app = FastAPI()

tokenizer = AutoTokenizer.from_pretrained('starcoder-finetuned')
model     = AutoModelForCausalLM.from_pretrained('starcoder-finetuned')
predictor = pipeline('text-generation', model=model, tokenizer=tokenizer, device=0)

class CodeRequest(BaseModel):
    prompt: str
    max_length: int = 128

@app.post('/generate')
def generate_code(req: CodeRequest):
    out = predictor(req.prompt, max_length=req.max_length, do_sample=False)
    return {'code': out[0]['generated_text']}

Running uvicorn service:app --host 0.0.0.0 --port 8000 spins up the API. I secure it behind my company’s VPN.


3. IDE Integration

VSCode Example: I use the REST Client extension to test, then switched to writing a small VSCode extension that calls my service on Ctrl+Space. In package.json:

"contributes": {
  "commands": [{
    "command": "extension.generateCode",
    "title": "Generate Code from LLM"
  }],
  "menus": {
    "editor/context": [{
      "command": "extension.generateCode",
      "when": "editorTextFocus"
    }]
  }
}

In extension.js, I fetch from http://localhost:8000/generate and insert the returned code at the cursor.


4. CI Integration

Pre-commit Hook: I added a hook that checks for missing docstrings by generating doc stubs: if the function has no docstring, call generate with a prompt like "Write a docstring for this function: <code>" and insert it.

# .pre-commit-config.yaml
-   repo: local
    hooks:
    - id: code-doc-gen
        name: Generate Docstrings via LLM
        entry: python hooks/doc_gen.py
        language: python

In doc_gen.py, I parse files for functions missing docstrings and call my API to generate them.


Wrapping Up

With this pipeline, I’ve got a private, customizable code assistant that lives on my machine. No more worrying about sensitive code in the cloud, and I can fine-tune on new patterns as my code evolves. Next up: deploying this as a Docker container so I can run it anywhere.