tutorials

Fine-Tuning LLMs: A Practical Guide for Custom AI Models

LearnClub AI
February 27, 2026
6 min read

Fine-Tuning LLMs: A Practical Guide for Custom AI Models

Fine-tuning allows you to adapt powerful pre-trained language models to your specific needs. Whether you’re building a customer service bot, a medical assistant, or a code generator, fine-tuning can dramatically improve performance.

When to Fine-Tune

Fine-Tuning is Right For:

  • Domain-specific tasks: Legal, medical, financial language
  • Style adaptation: Matching your brand voice
  • Consistent formatting: Structured outputs like JSON
  • Proprietary knowledge: Company-specific information

Prompt Engineering is Better For:

  • Simple tasks: Quick experiments, one-off queries
  • Rapid prototyping: Testing ideas before investing
  • General knowledge: Tasks within model’s training data
  • Budget constraints: No training infrastructure needed

Fine-Tuning Methods

1. Full Fine-Tuning

Update all model parameters. Most comprehensive but expensive.

Pros: Best performance Cons: Requires significant compute, risk of catastrophic forgetting

2. Parameter-Efficient Fine-Tuning (PEFT)

Update only a small subset of parameters.

LoRA (Low-Rank Adaptation)

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,  # rank
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

Pros: 1000x fewer parameters, faster training, smaller checkpoints Cons: Slightly lower performance than full fine-tuning

QLoRA

Quantized LoRA for even lower memory usage.

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

Step-by-Step Fine-Tuning Tutorial

Step 1: Environment Setup

pip install transformers datasets accelerate peft bitsandbytes

Step 2: Prepare Your Dataset

from datasets import Dataset

# Example: Customer support conversations
data = [
    {
        "instruction": "How do I reset my password?",
        "input": "",
        "output": "To reset your password, click 'Forgot Password' on the login page. Enter your email address and check your inbox for a reset link. The link expires in 24 hours."
    },
    # Add more examples...
]

dataset = Dataset.from_list(data)

# Format for training
def format_prompt(example):
    if example["input"]:
        prompt = f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n{example['output']}"
    else:
        prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}"
    return {"text": prompt}

dataset = dataset.map(format_prompt)

Step 3: Load Base Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-hf"  # or "mistralai/Mistral-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,  # For QLoRA
    device_map="auto"
)

Step 4: Configure LoRA

from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)

Step 5: Set Up Training

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    save_steps=100,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=True,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    )
)

Step 6: Train

model.config.use_cache = False  # Enable for training
trainer.train()

Step 7: Save and Export

# Save LoRA adapter
model.save_pretrained("./lora-adapter")

# Merge with base model (optional)
from peft import AutoPeftModelForCausalLM

merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged-model")

Dataset Best Practices

Data Quantity

Model SizeMinimum ExamplesRecommended
7B100-5001,000-5,000
13B200-1,0002,000-10,000
70B500-2,0005,000-20,000

Data Quality Guidelines

  1. Format consistency: Use the same template for all examples
  2. Output quality: Examples should represent your desired output
  3. Diversity: Cover edge cases and variations
  4. Length variety: Mix short and long responses
  5. Clean data: Remove duplicates and errors

Data Format Example

{
  "instruction": "Summarize the following article",
  "input": "[Article text here...]",
  "output": "[High-quality summary...]"
}

Hyperparameter Tuning

Key Parameters

ParameterDescriptionTypical Range
learning_rateStep size1e-5 to 5e-4
batch_sizeSamples per update4-32
epochsTraining iterations1-10
LoRA rank (r)Adapter complexity8-64
alphaScaling factor2x to 4x r

Learning Rate Recommendations

  • 7B models: 2e-4
  • 13B models: 1e-4
  • 70B models: 5e-5

Training Infrastructure

Hardware Requirements

ModelMethodGPU MemoryHardware
7BLoRA8-10 GBRTX 3090
7BQLoRA6-8 GBRTX 3070
13BLoRA16-20 GBA100 40GB
70BQLoRA40-48 GBA100 80GB

Cloud Options

  • Google Colab: Free tier with T4 (limited)
  • Lambda Cloud: $0.60/hour for A10
  • RunPod: $0.44/hour for RTX 4090
  • AWS SageMaker: Enterprise-grade, higher cost

Evaluation

Quantitative Metrics

from evaluate import load

# Perplexity
perplexity = load("perplexity")
results = perplexity.compute(model_id=model_name, predictions=predictions)

# BLEU (for translation/tasks)
bleu = load("bleu")
results = bleu.compute(predictions=predictions, references=references)

Qualitative Evaluation

Test your model:

  1. Hold-out test set: 10-20% of data
  2. Edge cases: Unusual inputs
  3. Adversarial tests: Attempts to break the model
  4. Human evaluation: Expert review of outputs

Common Issues and Solutions

Catastrophic Forgetting

Problem: Model loses general knowledge

Solutions:

  • Include diverse training data
  • Use lower learning rate
  • Shorter training (fewer epochs)
  • Mix with general instruction data

Overfitting

Symptoms: Perfect training loss, poor generalization

Solutions:

  • Add regularization (dropout, weight decay)
  • More training data
  • Early stopping
  • Reduce model complexity (lower rank)

Training Instability

Symptoms: Loss spikes, NaN values

Solutions:

  • Lower learning rate
  • Gradient clipping
  • Check data quality
  • Use mixed precision carefully

Deployment

Option 1: Hugging Face Inference API

from huggingface_hub import HfApi

api = HfApi()
api.upload_folder(
    folder_path="./merged-model",
    repo_id="yourusername/your-model",
    repo_type="model"
)

Option 2: Local Deployment

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="./merged-model",
    tokenizer=tokenizer
)

output = generator("### Instruction:\nSummarize this article\n\n### Response:", max_length=200)

Option 3: vLLM for Production

from vllm import LLM, SamplingParams

llm = LLM(model="./merged-model")
sampling_params = SamplingParams(temperature=0.7, max_tokens=200)

outputs = llm.generate(prompts, sampling_params)

Cost Analysis

Training Costs (Approximate)

ModelDurationCloud Cost
7B LoRA1-2 hours$1-3
13B LoRA2-4 hours$5-15
70B QLoRA6-12 hours$50-150

Inference Costs

Fine-tuned models cost the same to run as base models—you only pay for the additional storage (~10-100MB for LoRA adapters).

Next Steps

After mastering fine-tuning:

  1. RLHF: Reinforcement learning from human feedback
  2. DPO: Direct preference optimization
  3. Multi-task training: Single model for multiple tasks
  4. Continual learning: Update models with new data

Learn more AI development techniques in our guides section and explore AI tools.

Share this article