Fine-Tuning Methods
Smart Studio supports two approaches: LoRA and Full-Parameter Fine-Tuning. This guide explains how each approach works and helps you decide which one is right for your use case.
LoRA
What is LoRA?
LoRA (Low-Rank Adaptation) is a fine-tuning technique that solves this problem by keeping the original model weights frozen and injecting small, trainable components into the model. Instead of updating billions of parameters, LoRA typically trains less than 1% of them — while achieving comparable results to full fine-tuning.
How Does LoRA Work?
-
Freezing the original model — The pre-trained weights remain unchanged, preserving the model's general knowledge.
-
Adding small adapter matrices — LoRA inserts pairs of small matrices into specific layers of the model. These matrices are much smaller than the original weight matrices.
-
Training only the adapters — During fine-tuning, only these small matrices are updated. The total number of trainable parameters drops dramatically.
-
Merging at inference — The adapter outputs are combined with the original model's outputs, producing a model that behaves as if it was fully fine-tuned.
Key Concepts
Rank
Rank controls the size of the adapter matrices — think of it as the learning capacity of the adapter.
- Low rank (e.g., 4, 8): Fewer trainable parameters, faster and cheaper, but limited capacity for complex tasks.
- High rank (e.g., 32, 64): More trainable parameters, capable of learning more complex patterns, but increases memory usage.
A good starting point is 8 or 16 for most tasks.
Target Modules
Target modules determine which layers of the model receive LoRA adapters. Not every layer needs to be adapted.
Common target layers include the attention mechanism
components: q_proj, k_proj, v_proj, and o_proj.
Adapting more layers allows for more comprehensive
fine-tuning but increases the number of trainable parameters.
For most use cases, the platform's default target modules provide a good balance.
When to Use LoRA
LoRA is the right choice when:
- You want to fine-tune a model without high GPU costs
- Your dataset is small to medium-sized
- You need fast iteration — train, evaluate, adjust, repeat
- You want to create multiple task-specific adapters from the same base model
Full-Parameter Fine-Tuning
What is Full-Parameter Fine-Tuning?
Full-parameter fine-tuning updates every parameter in a pre-trained model during training. Unlike LoRA, which trains less than 1% of parameters, full-parameter fine-tuning modifies the entire model, giving it the highest capacity to adapt to new tasks.
How Does Full-Parameter Fine-Tuning Work?
-
Loading the full model — All pre-trained weights are loaded into memory and made available for updates.
-
Training on your dataset — The model trains on your domain-specific data. All parameters are updated during each training step.
-
Producing a new model — The output is a fully updated model, independent from the original base model.
Key Concepts
Epochs
Epochs set the number of times the model trains over your entire dataset. Full-parameter fine-tuning requires fewer epochs than LoRA because all parameters are updated at each step.
- Too few: The model may underfit and not adapt sufficiently to your dataset.
- Too many: The model may overfit and lose generalization ability.
A good starting point is 3 epochs for most tasks.
Learning Rate
Learning rate controls how much the model updates its parameters at each training step. Full-parameter fine-tuning is more sensitive to learning rate than LoRA, as all parameters are updated simultaneously.
- Too high: The model may overwrite existing knowledge and produce unstable results.
- Too low: Training converges slowly or fails to adapt the model effectively.
A good starting point for full-parameter fine-tuning is 0.00002. For most tasks, keep the value between 0.00001 and 0.00005.
Batch Size
Batch size sets the number of training samples processed in each training step. Larger batch sizes require more GPU memory but produce more stable gradient updates.
- Small batch size (e.g., 4, 8): Lower memory usage, but noisier gradient updates.
- Large batch size (e.g., 32, 64): More stable training, but requires significantly more GPU memory.
A good starting point is 16 for most tasks.
When to Use Full-Parameter Fine-Tuning
Full-parameter fine-tuning is the right choice when:
- Your task requires the model to significantly shift its existing knowledge or behavior
- You have a large, high-quality dataset
- Maximum performance is the priority over cost and speed
- You have access to sufficient GPU resources
LoRA vs. Full-Parameter Fine-Tuning
| LoRA | Full-Parameter Fine-Tuning | |
|---|---|---|
| What gets trained | Small adapter matrices (~0.1–1% of parameters) | All model parameters (100%) |
| GPU memory | Low | Very high |
| Training speed | Fast | Slow |
| Training cost | Low | High |
| Risk of overfitting | Lower (fewer trainable parameters) | Higher |
| Performance | Comparable for most tasks | Slightly better ceiling for highly complex tasks |
| Output | Lightweight adapter file | Full model copy |
Next Steps
Train models on instruction-response pairs to improve performance on specific tasks.
Align model outputs with human preferences using chosen and rejected response pairs.
Transfer knowledge from a larger teacher model to a smaller student model for efficient deployment.