Overview

Explore training methods, fine-tuning methods, and supported models for fine-tuning AI models on Smart Studio.

Fine-tuning adapts pre-trained models to specific tasks and datasets to improve domain-specific performance. Smart Studio supports three training methods (SFT, DPO, and Distillation) and multiple fine-tuning methods (LoRA, Full-Parameter Fine-Tuning) to make fine-tuning efficient and more effective.

Training Methods

Smart Studio supports the following fine-tuning methods for different use cases and data types.

SFT: Supervised Fine-Tuning

Supervised Fine-Tuning trains models on labeled input-output pairs to learn specific tasks or adapt to new domains.

Supervised Fine-Tuning - LLM

Optimize language models for text-based tasks such as:

Conversational AI and chatbots
Text classification and sentiment analysis
Content generation and summarization
Domain-specific language understanding

Learn More

Supervised Fine-Tuning - VLM

Adapt vision-language models for multimodal tasks including:

Image captioning and description
Visual question answering
Document understanding and OCR
Medical image analysis

Learn More

DPO: Direct Preference Optimization

Advanced training method that optimizes models based on human preferences and comparative feedback.

Direct Preference Optimization - LLM

Ideal for scenarios requiring human-aligned outputs:

Improving response quality and safety
Aligning model behavior with human values
Reducing harmful or biased outputs
Fine-Tuning based on preference rankings

Learn More

Direct Preference Optimization - VLM

Align vision-language model outputs with human preferences for multimodal scenarios such as:

Image description quality improvement
Visual reasoning accuracy alignment
Multimodal response safety enhancement

Learn More

Distillation

Distillation transfers knowledge from a larger teacher model to a smaller student model, achieving similar performance with lower inference costs.

White-Box Distillation

Knowledge Distillation - The student model not only mimics the teacher's answers but also learns the teacher's problem-solving approach and reasoning process through internal states (logits, hidden layers).

Learn More

Black-Box Distillation

Data Distillation - For API-Based models only. The student model can only access the teacher's outputs (correct answers). Requires large amounts of input-output pairs for effective learning.

Learn More

Choosing the Right Method

Use SFT-LLM for language tasks with clear input-output examples.
Use SFT-VLM for multimodal applications involving images and text.
Use DPO when you have preference data or need to align model behavior with human values.
Use Data Distillation when you want small models to utilize capabilities from top API-based models.
Use Data Distillation when you want small model utilize the capability from top API-Based model.

Fine-Tuning Methods

Smart Studio supports the following fine-tuning methods to control how model parameters are updated during training. LoRA is used by default for all training methods.

LoRA (Default)

Parameter-efficient fine-tuning. Trains small adapter modules while keeping original weights frozen.

Best for:

Most use cases
Cost-efficient training
Quick experiments
Limited GPU resources

Availability: All training methods

Learn more about LoRA

Full-Parameter Fine-Tuning

Updates all model parameters during training. Provides maximum model capacity and flexibility.

Best for:

Maximum customization needs
When LoRA performance is insufficient
Large high-quality datasets

Availability: SFT-LLM, SFT-VLM

Learn more about Full-Parameter FT

SFT-Plus

Prevents catastrophic forgetting during fine-tuning. The model learns new tasks and retains its general capabilities.

Best for:

Domain-specific tasks that still require general abilities
Production models handling both specialized and general queries
Cases where previous fine-tuning degraded baseline performance

Availability: SFT-LLM

Supported Models

Smart Studio supports Fine-tuning for a wide range of state-of-the-art models across different providers and model types.

Model Name	Model Type	Training Method	Model Size
QwQ-32B	LLM	SFT, DPO	32B
QwQ-32B-Preview	LLM	SFT, DPO	32B
Qwen2.5-0.5B-Instruct	LLM	SFT, DPO, Distill	0.5B
Qwen2.5-1.5B-Instruct	LLM	SFT, DPO, Distill	1.5B
Qwen2.5-14B-Instruct	LLM	SFT, DPO	14B
Qwen2.5-32B-Instruct	LLM	SFT, DPO	32B
Qwen2.5-3B-Instruct	LLM	SFT, DPO, Distill	3B
Qwen2.5-72B-Instruct	LLM	SFT, DPO	72B
Qwen2.5-7B-Instruct	LLM	SFT, DPO, Distill	7B
Qwen2.5-Coder-14B-Instruct	LLM	SFT, DPO	14B
Qwen2.5-Coder-32B-Instruct	LLM	SFT	32B
Qwen2.5-Coder-7B-Instruct	LLM	SFT, DPO	7B
Qwen2.5-Math-1.5B-Instruct	LLM	SFT, DPO	1.5B
Qwen2.5-Math-72B-Instruct	LLM	SFT, DPO	72B
Qwen2.5-Math-7B-Instruct	LLM	SFT, DPO	7B
Qwen2.5-VL-32B-Instruct	VLM	SFT, DPO	32B
Qwen2.5-VL-3B-Instruct	VLM	SFT, Distill, DPO	3B
Qwen2.5-VL-72B-Instruct	VLM	SFT	72B
Qwen2.5-VL-7B-Instruct	VLM	SFT, DPO	7B
Qwen3-0.6B	LLM	SFT, DPO, Distill	0.6B
Qwen3-1.7B	LLM	SFT, DPO, Distill	1.7B
Qwen3-14B	LLM	SFT, DPO	14B
Qwen3-235B-A22B	LLM	SFT	235B
Qwen3-235B-A22B-Instruct-2507	LLM	SFT	235B
Qwen3-235B-A22B-Thinking-2507	LLM	SFT	235B
Qwen3-30B-A3B	LLM	SFT, DPO	30B
Qwen3-30B-A3B-Instruct-2507	LLM	SFT, DPO	30B
Qwen3-30B-A3B-Thinking-2507	LLM	SFT, DPO	30B
Qwen3-32B	LLM	SFT, DPO	32B
Qwen3-4B	LLM	SFT, DPO, Distill	4B
Qwen3-4B-Instruct-2507	LLM	SFT, DPO	4B
Qwen3-4B-Thinking-2507	LLM	SFT, DPO, Distill	4B
Qwen3-8B	LLM	SFT, DPO, Distill	8B
Qwen3-Coder-30B-A3B-Instruct	LLM	SFT, DPO	30B
Qwen3-VL-2B-Instruct	VLM	SFT, Distill, DPO	2B
Qwen3-VL-2B-Thinking	VLM	SFT, Distill, DPO	2B
Qwen3-VL-32B-Instruct	VLM	SFT, DPO	32B
Qwen3-VL-32B-Thinking	VLM	SFT, DPO	32B
Qwen3-VL-4B-Instruct	VLM	SFT, Distill, DPO	4B
Qwen3-VL-4B-Thinking	VLM	SFT, Distill, DPO	4B
Qwen3-VL-8B-Instruct	VLM	SFT, DPO	8B
Qwen3-VL-8B-Thinking	VLM	SFT, DPO	8B
DeepSeek-Prover-V2-7B	LLM	SFT, DPO	7B
DeepSeek-R1-Distill-Qwen-1.5B	LLM	SFT, DPO	1.5B
DeepSeek-R1-Distill-Qwen-14B	LLM	SFT, DPO	14B
DeepSeek-R1-Distill-Qwen-32B	LLM	SFT, DPO	32B
DeepSeek-R1-Distill-Qwen-7B	LLM	SFT, DPO	7B
Kimi-Dev-72B	LLM	SFT	72B
GLM-4.5-Air	LLM	SFT	106B
GLM-4.5V	VLM	SFT	106B

Expanding Model Support

We continuously add support for new models. If you need a specific model that's not listed, please contact our support team or check our Model Gallery for the latest additions.

Next Steps

Ready to start Fine-Tuning Choose your approach based on your specific requirements:

Text Fine-Tuning

Learn how to fFine-Tuning language models for text-based applications.

Learn More

Vision Fine-Tuning

Explore multimodal Fine-Tuning for vision-language tasks.

Learn More

Preference Optimization

Implement DPO for human-aligned model behavior.

Learn More

Training Methods​

SFT: Supervised Fine-Tuning​

DPO: Direct Preference Optimization​

Distillation​

Fine-Tuning Methods​

Supported Models​

Next Steps​

Training Methods

SFT: Supervised Fine-Tuning

DPO: Direct Preference Optimization

Distillation

Fine-Tuning Methods

Supported Models

Next Steps