Overview
Explore training methods, fine-tuning methods, and supported models for fine-tuning AI models on Smart Studio.
Fine-tuning adapts pre-trained models to specific tasks and datasets to improve domain-specific performance. Smart Studio supports three training methods (SFT, DPO, and Distillation) and multiple fine-tuning methods (LoRA, Full-Parameter Fine-Tuning) to make fine-tuning efficient and more effective.
Training Methods
Smart Studio supports the following fine-tuning methods for different use cases and data types.
SFT: Supervised Fine-Tuning
Supervised Fine-Tuning trains models on labeled input-output pairs to learn specific tasks or adapt to new domains.
Optimize language models for text-based tasks such as:
- Conversational AI and chatbots
- Text classification and sentiment analysis
- Content generation and summarization
- Domain-specific language understanding
Adapt vision-language models for multimodal tasks including:
- Image captioning and description
- Visual question answering
- Document understanding and OCR
- Medical image analysis
DPO: Direct Preference Optimization
Advanced training method that optimizes models based on human preferences and comparative feedback.
Ideal for scenarios requiring human-aligned outputs:
- Improving response quality and safety
- Aligning model behavior with human values
- Reducing harmful or biased outputs
- Fine-Tuning based on preference rankings
Align vision-language model outputs with human preferences for multimodal scenarios such as:
- Image description quality improvement
- Visual reasoning accuracy alignment
- Multimodal response safety enhancement
Distillation
Distillation transfers knowledge from a larger teacher model to a smaller student model, achieving similar performance with lower inference costs.
Knowledge Distillation - The student model not only mimics the teacher's answers but also learns the teacher's problem-solving approach and reasoning process through internal states (logits, hidden layers).
Data Distillation - For API-Based models only. The student model can only access the teacher's outputs (correct answers). Requires large amounts of input-output pairs for effective learning.
- Use SFT-LLM for language tasks with clear input-output examples.
- Use SFT-VLM for multimodal applications involving images and text.
- Use DPO when you have preference data or need to align model behavior with human values.
- Use Data Distillation when you want small models to utilize capabilities from top API-based models.
- Use Data Distillation when you want small model utilize the capability from top API-Based model.
Fine-Tuning Methods
Smart Studio supports the following fine-tuning methods to control how model parameters are updated during training. LoRA is used by default for all training methods.
Parameter-efficient fine-tuning. Trains small adapter modules while keeping original weights frozen.
Best for:
- Most use cases
- Cost-efficient training
- Quick experiments
- Limited GPU resources
Availability: All training methods
Updates all model parameters during training. Provides maximum model capacity and flexibility.
Best for:
- Maximum customization needs
- When LoRA performance is insufficient
- Large high-quality datasets
Availability: SFT-LLM, SFT-VLM
Prevents catastrophic forgetting during fine-tuning. The model learns new tasks and retains its general capabilities.
Best for:
- Domain-specific tasks that still require general abilities
- Production models handling both specialized and general queries
- Cases where previous fine-tuning degraded baseline performance
Availability: SFT-LLM
Supported Models
Smart Studio supports Fine-tuning for a wide range of state-of-the-art models across different providers and model types.
| Model Name | Model Type | Training Method | Model Size |
|---|---|---|---|
| QwQ-32B | LLM | SFT, DPO | 32B |
| QwQ-32B-Preview | LLM | SFT, DPO | 32B |
| Qwen2.5-0.5B-Instruct | LLM | SFT, DPO, Distill | 0.5B |
| Qwen2.5-1.5B-Instruct | LLM | SFT, DPO, Distill | 1.5B |
| Qwen2.5-14B-Instruct | LLM | SFT, DPO | 14B |
| Qwen2.5-32B-Instruct | LLM | SFT, DPO | 32B |
| Qwen2.5-3B-Instruct | LLM | SFT, DPO, Distill | 3B |
| Qwen2.5-72B-Instruct | LLM | SFT, DPO | 72B |
| Qwen2.5-7B-Instruct | LLM | SFT, DPO, Distill | 7B |
| Qwen2.5-Coder-14B-Instruct | LLM | SFT, DPO | 14B |
| Qwen2.5-Coder-32B-Instruct | LLM | SFT | 32B |
| Qwen2.5-Coder-7B-Instruct | LLM | SFT, DPO | 7B |
| Qwen2.5-Math-1.5B-Instruct | LLM | SFT, DPO | 1.5B |
| Qwen2.5-Math-72B-Instruct | LLM | SFT, DPO | 72B |
| Qwen2.5-Math-7B-Instruct | LLM | SFT, DPO | 7B |
| Qwen2.5-VL-32B-Instruct | VLM | SFT, DPO | 32B |
| Qwen2.5-VL-3B-Instruct | VLM | SFT, Distill, DPO | 3B |
| Qwen2.5-VL-72B-Instruct | VLM | SFT | 72B |
| Qwen2.5-VL-7B-Instruct | VLM | SFT, DPO | 7B |
| Qwen3-0.6B | LLM | SFT, DPO, Distill | 0.6B |
| Qwen3-1.7B | LLM | SFT, DPO, Distill | 1.7B |
| Qwen3-14B | LLM | SFT, DPO | 14B |
| Qwen3-235B-A22B | LLM | SFT | 235B |
| Qwen3-235B-A22B-Instruct-2507 | LLM | SFT | 235B |
| Qwen3-235B-A22B-Thinking-2507 | LLM | SFT | 235B |
| Qwen3-30B-A3B | LLM | SFT, DPO | 30B |
| Qwen3-30B-A3B-Instruct-2507 | LLM | SFT, DPO | 30B |
| Qwen3-30B-A3B-Thinking-2507 | LLM | SFT, DPO | 30B |
| Qwen3-32B | LLM | SFT, DPO | 32B |
| Qwen3-4B | LLM | SFT, DPO, Distill | 4B |
| Qwen3-4B-Instruct-2507 | LLM | SFT, DPO | 4B |
| Qwen3-4B-Thinking-2507 | LLM | SFT, DPO, Distill | 4B |
| Qwen3-8B | LLM | SFT, DPO, Distill | 8B |
| Qwen3-Coder-30B-A3B-Instruct | LLM | SFT, DPO | 30B |
| Qwen3-VL-2B-Instruct | VLM | SFT, Distill, DPO | 2B |
| Qwen3-VL-2B-Thinking | VLM | SFT, Distill, DPO | 2B |
| Qwen3-VL-32B-Instruct | VLM | SFT, DPO | 32B |
| Qwen3-VL-32B-Thinking | VLM | SFT, DPO | 32B |
| Qwen3-VL-4B-Instruct | VLM | SFT, Distill, DPO | 4B |
| Qwen3-VL-4B-Thinking | VLM | SFT, Distill, DPO | 4B |
| Qwen3-VL-8B-Instruct | VLM | SFT, DPO | 8B |
| Qwen3-VL-8B-Thinking | VLM | SFT, DPO | 8B |
| DeepSeek-Prover-V2-7B | LLM | SFT, DPO | 7B |
| DeepSeek-R1-Distill-Qwen-1.5B | LLM | SFT, DPO | 1.5B |
| DeepSeek-R1-Distill-Qwen-14B | LLM | SFT, DPO | 14B |
| DeepSeek-R1-Distill-Qwen-32B | LLM | SFT, DPO | 32B |
| DeepSeek-R1-Distill-Qwen-7B | LLM | SFT, DPO | 7B |
| Kimi-Dev-72B | LLM | SFT | 72B |
| GLM-4.5-Air | LLM | SFT | 106B |
| GLM-4.5V | VLM | SFT | 106B |
We continuously add support for new models. If you need a specific model that's not listed, please contact our support team or check our Model Gallery for the latest additions.
Next Steps
Ready to start Fine-Tuning Choose your approach based on your specific requirements: