Fine-Tuning Methods

Smart Studio supports two approaches: LoRA and Full-Parameter Fine-Tuning. This guide explains how each approach works and helps you decide which one is right for your use case.

LoRA

What is LoRA?

LoRA (Low-Rank Adaptation) is a fine-tuning technique that solves this problem by keeping the original model weights frozen and injecting small, trainable components into the model. Instead of updating billions of parameters, LoRA typically trains less than 1% of them — while achieving comparable results to full fine-tuning.

How Does LoRA Work?

Freezing the original model — The pre-trained weights remain unchanged, preserving the model's general knowledge.
Adding small adapter matrices — LoRA inserts pairs of small matrices into specific layers of the model. These matrices are much smaller than the original weight matrices.
Training only the adapters — During fine-tuning, only these small matrices are updated. The total number of trainable parameters drops dramatically.
Merging at inference — The adapter outputs are combined with the original model's outputs, producing a model that behaves as if it was fully fine-tuned.

Key Concepts

Rank

Rank controls the size of the adapter matrices — think of it as the learning capacity of the adapter.

Low rank (e.g., 4, 8): Fewer trainable parameters, faster and cheaper, but limited capacity for complex tasks.
High rank (e.g., 32, 64): More trainable parameters, capable of learning more complex patterns, but increases memory usage.

A good starting point is 8 or 16 for most tasks.

Target Modules

Target modules determine which layers of the model receive LoRA adapters. Not every layer needs to be adapted.

Common target layers include the attention mechanism components: q_proj, k_proj, v_proj, and o_proj. Adapting more layers allows for more comprehensive fine-tuning but increases the number of trainable parameters.

For most use cases, the platform's default target modules provide a good balance.

When to Use LoRA

LoRA is the right choice when:

You want to fine-tune a model without high GPU costs
Your dataset is small to medium-sized
You need fast iteration — train, evaluate, adjust, repeat
You want to create multiple task-specific adapters from the same base model

Full-Parameter Fine-Tuning

What is Full-Parameter Fine-Tuning?

Full-parameter fine-tuning updates every parameter in a pre-trained model during training. Unlike LoRA, which trains less than 1% of parameters, full-parameter fine-tuning modifies the entire model, giving it the highest capacity to adapt to new tasks.

How Does Full-Parameter Fine-Tuning Work?

Loading the full model — All pre-trained weights are loaded into memory and made available for updates.
Training on your dataset — The model trains on your domain-specific data. All parameters are updated during each training step.
Producing a new model — The output is a fully updated model, independent from the original base model.

Key Concepts

Epochs

Epochs set the number of times the model trains over your entire dataset. Full-parameter fine-tuning requires fewer epochs than LoRA because all parameters are updated at each step.

Too few: The model may underfit and not adapt sufficiently to your dataset.
Too many: The model may overfit and lose generalization ability.

A good starting point is 3 epochs for most tasks.

Learning Rate

Learning rate controls how much the model updates its parameters at each training step. Full-parameter fine-tuning is more sensitive to learning rate than LoRA, as all parameters are updated simultaneously.

Too high: The model may overwrite existing knowledge and produce unstable results.
Too low: Training converges slowly or fails to adapt the model effectively.

A good starting point for full-parameter fine-tuning is 0.00002. For most tasks, keep the value between 0.00001 and 0.00005.

Batch Size

Batch size sets the number of training samples processed in each training step. Larger batch sizes require more GPU memory but produce more stable gradient updates.

Small batch size (e.g., 4, 8): Lower memory usage, but noisier gradient updates.
Large batch size (e.g., 32, 64): More stable training, but requires significantly more GPU memory.

A good starting point is 16 for most tasks.

When to Use Full-Parameter Fine-Tuning

Full-parameter fine-tuning is the right choice when:

Your task requires the model to significantly shift its existing knowledge or behavior
You have a large, high-quality dataset
Maximum performance is the priority over cost and speed
You have access to sufficient GPU resources

LoRA vs. Full-Parameter Fine-Tuning

	LoRA	Full-Parameter Fine-Tuning
What gets trained	Small adapter matrices (~0.1–1% of parameters)	All model parameters (100%)
GPU memory	Low	Very high
Training speed	Fast	Slow
Training cost	Low	High
Risk of overfitting	Lower (fewer trainable parameters)	Higher
Performance	Comparable for most tasks	Slightly better ceiling for highly complex tasks
Output	Lightweight adapter file	Full model copy

Next Steps

Supervised Fine-Tuning

Train models on instruction-response pairs to improve performance on specific tasks.

Learn More

Direct Preference Optimization

Align model outputs with human preferences using chosen and rejected response pairs.

Learn More

Distillation

Transfer knowledge from a larger teacher model to a smaller student model for efficient deployment.

Learn More

LoRA​

What is LoRA?​

How Does LoRA Work?​

Key Concepts​

Rank​

Target Modules​

When to Use LoRA​

Full-Parameter Fine-Tuning​

What is Full-Parameter Fine-Tuning?​

How Does Full-Parameter Fine-Tuning Work?​

Key Concepts​

Epochs​

Learning Rate​

Batch Size​

When to Use Full-Parameter Fine-Tuning​

LoRA vs. Full-Parameter Fine-Tuning​

Next Steps​

LoRA

What is LoRA?

How Does LoRA Work?

Key Concepts

Rank

Target Modules

When to Use LoRA

Full-Parameter Fine-Tuning

What is Full-Parameter Fine-Tuning?

How Does Full-Parameter Fine-Tuning Work?

Key Concepts

Epochs

Learning Rate

Batch Size

When to Use Full-Parameter Fine-Tuning

LoRA vs. Full-Parameter Fine-Tuning

Next Steps