Skip to main content

Supervised Fine-Tuning - LLM

Learn how to fine-tune large language models for text-based tasks using supervised learning with custom datasets.

Purpose and Overview

Supervised Fine-Tuning (SFT) for text adapts pre-trained language models to specific use cases and domains. SFT customizes model behavior, improves domain-specific performance, and enhances contextual understanding.

Step 1: Model & Training Selection

Select a training method, fine-tuning method, model type, and base model for the SFT training task.


Supervised Fine-Tuning


1. Choose a Training Method

Select SFT (Supervised) as the training method.

Fine-Tuning Options

SFT for LLM provides the following optional settings.

Fine-Tuning Method

  • LoRA (Default): Trains small adapter modules on top of frozen model weights. LoRA reduces computational cost and memory usage. Recommended for most use cases.

  • Full-Parameter Fine-Tuning: Updates all model parameters during training. Provides maximum model capacity but requires more computational resources than LoRA.

Preserve General Skills (SFT-Plus)

Select Preserve General Skills to strengthen domain-specific capabilities and reduce the risk of catastrophic forgetting during training.

What is SFT-Plus

Standard fine-tuning improves a model's performance on specific tasks, but can reduce its general capabilities. SFT-Plus addresses this by training your model on specialized data while preserving its core abilities, such as reasoning and language comprehension.

2. Select a Model Type

  • LLM: For text-only tasks such as text generation, translation, and question answering.
  • VLM: For multimodal tasks involving text and images. See Supervised Fine-Tuning - VLM.

3. Choose a Base Model

Select a base model as the foundation for fine-tuning. The choice of base model impacts the final performance and capabilities of the fine-tuned model. For detailed model comparisons and selection criteria, see How to Choose Models.

Selection Tips
  • Start with Instruct models for most conversational applications. (e.g., Qwen3-4B-Instruct-2507)
  • Choose Thinking models when your task requires step-by-step reasoning. (e.g., Qwen3-4B-Thinking-2507)
  • Use base (Dense) models when you need maximum customization flexibility. (e.g., Qwen3-4B)
  • Consider MOE models for production deployments requiring both high performance and efficiency. (e.g., Qwen3-30B-A3B)

After completing all selections, click Continue.

Step 2: Dataset & Evaluation

Upload a training dataset and configure evaluation settings to monitor training progress and model performance.

Smart Studio provides multiple ways to prepare datasets:

  • Upload a dataset directly. For instructions, see Create Datasets.
  • Use AI Dataset Preparation to automate the dataset creation process.
  • Provide the OSS address of the data without uploading the file to the platform.

Dataset & Evaluation


Dataset Requirements

File Format

Format your dataset file as JSONL. Each line must be a valid JSON object that represents one training example.

Dataset Size

Recommended size: 100–100,000 examples. Start with smaller datasets for initial experiments and scale up based on performance needs.

Data Quality Tips
  • Ensure diverse examples covering different scenarios and edge cases
  • Maintain consistent response quality and style throughout the dataset
  • Include both positive and negative examples where applicable
  • Validate that all examples follow the required JSON schema

Required Data Format

{
"messages": [
{"role": "system", "content": "<system>"},
{"role": "user", "content": "<query1>"},
{"role": "assistant", "content": "<response1>"},
{"role": "user", "content": "<query2>"},
{"role": "assistant", "content": "<response2>"}
]
}

Format Explanation

  • system: Optional system prompt defining model behavior and context

  • user: User input or query for the model to respond to

  • assistant: Expected model response for the given user input

Each line in the JSONL file must contain one complete conversation example.

Example Data Formats

{"messages": [
{"role": "system", "content": "You are a useful and harmless assistant"},
{"role": "user", "content": "Tell me the weather tomorrow"},
{"role": "assistant", "content": "Sunny tomorrow"}
]}
{"messages": [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."}
]}
{"messages": [
{"role": "system", "content": "You are a technical support assistant"},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "To reset your password, follow these steps:
1. Go to the login page
2. Click 'Forgot Password'
3. Enter your email address
4. Check your email for reset instructions"
}
]}

Evaluation Options

Automatically splits training data into training and validation sets. The system reserves 10–20% of the data for evaluation. Auto-carveout enables training progress monitoring without separate evaluation data.

No Evaluation

Skips evaluation during training. No evaluation reduces training time but provides no insights into model performance. Recommended for experienced users only.

Step 3: Settings & Options

Configure training parameters and model settings. Start with the default values for your first training job, then adjust based on your specific requirements and results.
Settings & Options


Basic Configuration

Custom Model Name

Display name in My Models for management purposes. Choose a descriptive name that identifies the model's purpose and version.

Example: "Customer-Service-Bot-v2" or "Technical-Support-Assistant"

Task Display Name

Display name for the fine-tuning task. The name appears in the Fine-tuning task list and helps track training progress and history.

Example: "Q1-2025-SFT-Customer-Service" or "Domain-Adaptation-Jan"

GPU Type

Select the GPU type for training. If the required GPU type is not available in the list, contact us to request access.

Training Parameters

The following parameters apply to all fine-tuning methods unless otherwise noted. LoRA is the default fine-tuning method.

ParameterDefinitionTuning Impact
epochThe number of complete passes through the training dataset.Increase: More learning opportunities, but the model may perform well on training data while producing poor results on new inputs.
Decrease: Trains faster, but the model may not learn enough to perform well.
batch_sizeDefines the number of training examples to process in a single group. The model learns from each group before moving to the next.Increase: Produces more consistent training updates, but uses significantly more GPU memory.
Decrease: Reduces GPU memory usage, but training updates may become less consistent.
learning_rateControls the size of each adjustment the model makes during training.Increase: The model learns faster, but training may become unstable and fail to reach a good solution.
Decrease: Training becomes more stable and precise, but takes longer and may settle for a solution that is not optimal.
lora_rankSets the learning capacity of the LoRA adapters. Available when Full-Parameter Fine-Tuning is not selected.Increase (e.g., 16, 32): Improves the model's ability to learn complex tasks, but uses more GPU memory.
Decrease (e.g., 4, 8): Reduces GPU memory usage, but the model may struggle with complex tasks.
lambdaSets the trade-off between adapting to the new data and retaining the model's original abilities. Available only when Preserve General Skills is selected.Increase: Better preserves the model's existing knowledge, but it may adapt less effectively to your new data.
Decrease: Helps the model adapt more to your new data, but it may lose some of its existing knowledge.
temperatureControls how closely training follows the original model's behavior. Available only when Preserve General Skills is selected.Increase: Makes the original model's guidance more flexible, giving more room to learn new patterns, but reduces its overall influence.
Decrease: Stays closer to the original model's behavior, but limits new learning.

Advanced Parameters

These parameters work well with default values for most use cases. Adjust only when needed.

ParameterDefinitionTuning Impact
max_context_lengthSets the maximum token limit per example. Texts exceeding this limit will be truncated.Increase to learn from longer texts, but this significantly increases GPU memory usage.
warmup_ratioSpecifies the fraction of the training process to use for a "warm-up" phase. During this phase, the learning rate slowly increases to prevent early training instability.A small value (0.03–0.1) is generally recommended. This is primarily a stability mechanism, not a performance tuning parameter.
gradient_accumulation_stepsSpecifies the number of small batches to process before the model performs a single learning update. This simulates a larger batch size to save memory.Increase to achieve more stable training at the cost of slower speed. A value of 1 disables this feature.
target_modulesIdentifies the specific internal components (layers) of the model that will be modified by LoRA. Available when Full-Parameter Fine-Tuning is not selected.Adding more modules allows more comprehensive adaptation but increases trainable parameters.

After reviewing all the configuration, click Create Task to begin the training process.

Step 4: Monitor Training Progress

During and after training, check key training metrics at any time.


monitor


The Model Loss chart displays two metrics:

  • Training Loss: Measures how well the model learns from your training data.
  • Validation Loss: Measures how well the model generalizes to unseen data.
Interpret training metrics
  • If both losses decrease steadily, your model is learning well. Continue training.
  • If training loss decreases but validation loss increases, your model may be overfitting. Stop training and deploy the current model.
  • If both losses remain high or increase, your training data or configuration may need adjustment. Review your dataset and parameters.

Parameter Tuning Guidelines

  • Start with defaults: Default values work well for most use cases.
  • Increase LoRA rank: Increase to 16, 32, 64, or 128 for complex tasks.
  • Adjust learning rate: Lower values for stable training, higher values for faster convergence.
  • Monitor validation loss: Watch for a decrease in validation loss.

Next Steps

Deploy the Fine-Tuned Model

Deploy the fine-tuned model to a production endpoint for real-world usage.

Test in Model Lab

Test the fine-tuned model's performance and compare against base models in the interactive testing environment.