3 minutes
Domain-Specific Q&A Assistants with Local LLMs
Introduction
I wanted a Q&A bot that knows my field—computer science, scientific articles, or internal engineering docs—without sending queries to a general-purpose cloud model. By hosting a local LLM, I get on-premise privacy and sub-second responses. Here’s how I set up a domain-specific Q&A assistant tailored to my needs.
Why Build a Domain-Specific Assistant
- Privacy & Compliance: Sensitive docs never leave my network.
- Accuracy in Context: The model focuses on domain data rather than general web knowledge.
- Cost Efficiency: No per-query API fees when I’m debugging or prototyping.
High-Level Steps
- Collect & Clean Domain Data – Gather relevant docs (PDFs, wikis, spreadsheets).
- Option A: Fine-Tuning – Continue-training the base LLM on in-domain text.
- Option B: Prompt Engineering – Use few-shot context injection without retraining.
- Deploy & Query – Serve via an API or simple CLI for quick lookups.
I’ll walk through both fine-tuning (Option A) and prompt-based (Option B) approaches so I can compare what works best.
1. Collect & Clean Domain Data
I dumped all my domain docs into a data/
folder. For PDFs I used pdfplumber
, for Word docs python-docx
, and for CSVs I cleaned them up with pandas
. Then I sanitized text—removed headers, footers, and redacted PII manually.
import os
import pandas as pd
from pdfplumber import open as open_pdf
from docx import Document
def extract_text(path):
if path.endswith('.pdf'):
with open_pdf(path) as pdf:
return '\n'.join([p.extract_text() for p in pdf.pages])
elif path.endswith('.docx'):
return '\n'.join([p.text for p in Document(path).paragraphs])
elif path.endswith('.csv'):
df = pd.read_csv(path)
return '\n'.join(df['notes'].tolist())
return ''
docs = []
for f in os.listdir('data/'):
docs.append(extract_text(os.path.join('data/', f)))
print(f"Loaded {len(docs)} domain documents.")
A quick regex pass helped remove common headers or sensitive tokens.
2. Option A: Fine-Tuning the LLM
Why: If my domain data is large enough (at least a few hundred MB of text), fine-tuning can embed domain knowledge directly into the model’s weights.
# Using Hugging Face Trainer and LoRA for efficiency
pip install peft
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
model_name = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Setup LoRA
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=8, alpha=16, dropout=0.05
)
model = get_peft_model(model, peft_config)
# Prepare dataset
from datasets import Dataset
encodings = tokenizer(docs, truncation=True, padding=True)
dataset = Dataset.from_dict(encodings)
# Training arguments
t_args = TrainingArguments(
output_dir='fine_tuned',
num_train_epochs=3,
per_device_train_batch_size=4,
logging_steps=50,
save_steps=200
)
trainer = Trainer(
model=model,
args=t_args,
train_dataset=dataset
)
trainer.train()
model.save_pretrained('domain-qa-model')
This takes some GPU time, but the resulting model feels way more knowledgeable on my docs.
3. Option B: Prompt Engineering
Why: If I don’t want to fine-tune or my data is small, I can just feed relevant context in the prompt.
from transformers import pipeline
qa_pipeline = pipeline('question-answering', model='gpt2-medium')
def answer_question(question, context):
return qa_pipeline({
'question': question,
'context': context
})['answer']
# Retrieve relevant docs or chunks (using our earlier semantic search)
ctx_chunks = retrieve(question, k=3)
full_context = '\n\n'.join(ctx_chunks)
print(answer_question('What is the policy on data retention?', full_context))
Less upfront cost, but might have to tweak prompt length or chunk selection to get precise answers.
4. Deploy & Query
I wrapped both approaches behind a small FastAPI app so I can hit POST /ask
with JSON:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class Query(BaseModel):
question: str
@app.post('/ask')
def ask(q: Query):
if use_finetuned:
ans = model.generate(...)
else:
ctx = retrieve(q.question)
ans = answer_question(q.question, '\n\n'.join(ctx))
return {'answer': ans}
Now I have a local Q&A service that respects my data’s privacy and works offline.
Wrapping Up
This setup lets me query my own domain knowledge quickly and privately. Next up: integrating this into a Slack bot so I can ask questions right from my chat client.