Skip to main content

Overview

Explore training methods, fine-tuning methods, and supported models for fine-tuning AI models on Smart Studio.

Fine-tuning adapts pre-trained models to specific tasks and datasets to improve domain-specific performance. Smart Studio supports three training methods (SFT, DPO, and Distillation) and multiple fine-tuning methods (LoRA, Full-Parameter Fine-Tuning) to make fine-tuning efficient and more effective.

Training Methods

Smart Studio supports the following fine-tuning methods for different use cases and data types.

SFT: Supervised Fine-Tuning

Supervised Fine-Tuning trains models on labeled input-output pairs to learn specific tasks or adapt to new domains.

Supervised Fine-Tuning - LLM

Optimize language models for text-based tasks such as:

  • Conversational AI and chatbots
  • Text classification and sentiment analysis
  • Content generation and summarization
  • Domain-specific language understanding
Supervised Fine-Tuning - VLM

Adapt vision-language models for multimodal tasks including:

  • Image captioning and description
  • Visual question answering
  • Document understanding and OCR
  • Medical image analysis

DPO: Direct Preference Optimization

Advanced training method that optimizes models based on human preferences and comparative feedback.

Direct Preference Optimization - LLM

Ideal for scenarios requiring human-aligned outputs:

  • Improving response quality and safety
  • Aligning model behavior with human values
  • Reducing harmful or biased outputs
  • Fine-Tuning based on preference rankings
Direct Preference Optimization - VLM

Align vision-language model outputs with human preferences for multimodal scenarios such as:

  • Image description quality improvement
  • Visual reasoning accuracy alignment
  • Multimodal response safety enhancement

Distillation

Distillation transfers knowledge from a larger teacher model to a smaller student model, achieving similar performance with lower inference costs.

White-Box Distillation

Knowledge Distillation - The student model not only mimics the teacher's answers but also learns the teacher's problem-solving approach and reasoning process through internal states (logits, hidden layers).

Black-Box Distillation

Data Distillation - For API-Based models only. The student model can only access the teacher's outputs (correct answers). Requires large amounts of input-output pairs for effective learning.

Choosing the Right Method
  • Use SFT-LLM for language tasks with clear input-output examples.
  • Use SFT-VLM for multimodal applications involving images and text.
  • Use DPO when you have preference data or need to align model behavior with human values.
  • Use Data Distillation when you want small models to utilize capabilities from top API-based models.
  • Use Data Distillation when you want small model utilize the capability from top API-Based model.

Fine-Tuning Methods

Smart Studio supports the following fine-tuning methods to control how model parameters are updated during training. LoRA is used by default for all training methods.

LoRA (Default)

Parameter-efficient fine-tuning. Trains small adapter modules while keeping original weights frozen.

Best for:

  • Most use cases
  • Cost-efficient training
  • Quick experiments
  • Limited GPU resources

Availability: All training methods

Full-Parameter Fine-Tuning

Updates all model parameters during training. Provides maximum model capacity and flexibility.

Best for:

  • Maximum customization needs
  • When LoRA performance is insufficient
  • Large high-quality datasets

Availability: SFT-LLM, SFT-VLM

SFT-Plus

Prevents catastrophic forgetting during fine-tuning. The model learns new tasks and retains its general capabilities.

Best for:

  • Domain-specific tasks that still require general abilities
  • Production models handling both specialized and general queries
  • Cases where previous fine-tuning degraded baseline performance

Availability: SFT-LLM

Supported Models

Smart Studio supports Fine-tuning for a wide range of state-of-the-art models across different providers and model types.

Model NameModel TypeTraining MethodModel Size
QwQ-32BLLMSFT, DPO32B
QwQ-32B-PreviewLLMSFT, DPO32B
Qwen2.5-0.5B-InstructLLMSFT, DPO, Distill0.5B
Qwen2.5-1.5B-InstructLLMSFT, DPO, Distill1.5B
Qwen2.5-14B-InstructLLMSFT, DPO14B
Qwen2.5-32B-InstructLLMSFT, DPO32B
Qwen2.5-3B-InstructLLMSFT, DPO, Distill3B
Qwen2.5-72B-InstructLLMSFT, DPO72B
Qwen2.5-7B-InstructLLMSFT, DPO, Distill7B
Qwen2.5-Coder-14B-InstructLLMSFT, DPO14B
Qwen2.5-Coder-32B-InstructLLMSFT32B
Qwen2.5-Coder-7B-InstructLLMSFT, DPO7B
Qwen2.5-Math-1.5B-InstructLLMSFT, DPO1.5B
Qwen2.5-Math-72B-InstructLLMSFT, DPO72B
Qwen2.5-Math-7B-InstructLLMSFT, DPO7B
Qwen2.5-VL-32B-InstructVLMSFT, DPO32B
Qwen2.5-VL-3B-InstructVLMSFT, Distill, DPO3B
Qwen2.5-VL-72B-InstructVLMSFT72B
Qwen2.5-VL-7B-InstructVLMSFT, DPO7B
Qwen3-0.6BLLMSFT, DPO, Distill0.6B
Qwen3-1.7BLLMSFT, DPO, Distill1.7B
Qwen3-14BLLMSFT, DPO14B
Qwen3-235B-A22BLLMSFT235B
Qwen3-235B-A22B-Instruct-2507LLMSFT235B
Qwen3-235B-A22B-Thinking-2507LLMSFT235B
Qwen3-30B-A3BLLMSFT, DPO30B
Qwen3-30B-A3B-Instruct-2507LLMSFT, DPO30B
Qwen3-30B-A3B-Thinking-2507LLMSFT, DPO30B
Qwen3-32BLLMSFT, DPO32B
Qwen3-4BLLMSFT, DPO, Distill4B
Qwen3-4B-Instruct-2507LLMSFT, DPO4B
Qwen3-4B-Thinking-2507LLMSFT, DPO, Distill4B
Qwen3-8BLLMSFT, DPO, Distill8B
Qwen3-Coder-30B-A3B-InstructLLMSFT, DPO30B
Qwen3-VL-2B-InstructVLMSFT, Distill, DPO2B
Qwen3-VL-2B-ThinkingVLMSFT, Distill, DPO2B
Qwen3-VL-32B-InstructVLMSFT, DPO32B
Qwen3-VL-32B-ThinkingVLMSFT, DPO32B
Qwen3-VL-4B-InstructVLMSFT, Distill, DPO4B
Qwen3-VL-4B-ThinkingVLMSFT, Distill, DPO4B
Qwen3-VL-8B-InstructVLMSFT, DPO8B
Qwen3-VL-8B-ThinkingVLMSFT, DPO8B
DeepSeek-Prover-V2-7BLLMSFT, DPO7B
DeepSeek-R1-Distill-Qwen-1.5BLLMSFT, DPO1.5B
DeepSeek-R1-Distill-Qwen-14BLLMSFT, DPO14B
DeepSeek-R1-Distill-Qwen-32BLLMSFT, DPO32B
DeepSeek-R1-Distill-Qwen-7BLLMSFT, DPO7B
Kimi-Dev-72BLLMSFT72B
GLM-4.5-AirLLMSFT106B
GLM-4.5VVLMSFT106B

Expanding Model Support

We continuously add support for new models. If you need a specific model that's not listed, please contact our support team or check our Model Gallery for the latest additions.

Next Steps

Ready to start Fine-Tuning Choose your approach based on your specific requirements:

Text Fine-Tuning

Learn how to fFine-Tuning language models for text-based applications.

Vision Fine-Tuning

Explore multimodal Fine-Tuning for vision-language tasks.

Preference Optimization

Implement DPO for human-aligned model behavior.