How to Choose Models
Selecting the right model for your use case is crucial for optimal performance. This guide recommends open-source models by category to help you find the best fit for your project.
Open Source vs. API-Based Models
Compare open-source and closed-source models to determine which type best fits your use case.
Deploy and customize models with full transparency and control.
- Complete code access for customization and optimization
- No usage restrictions
- Community-driven development and support
- Full transparency for compliance and security audits
Leverage optimized, production-ready models with professional support.
- Superior performance through proprietary optimization
- Managed infrastructure and automatic updates
- Dedicated technical support and SLAs
- Faster iteration and feature releases
- Select Open Source Models if you need fine-tuning capabilities or want full control over deployment
- Select API-Based Models if you prioritize stable performance and strong general-purpose capabilities
Understand Model Types
Learn about different model categories and variants to choose the right model for your needs.
Model Types
| Model Type | Description | Representative Models |
|---|---|---|
| Large Language Models (LLM) | Handle natural language tasks such as text generation, comprehension, summarization, and reasoning. | Qwen, GLM, DeepSeek, etc. |
| Vision-Language Models (VLM) | Integrate image and text modalities to support image captioning, visual question answering, and multimodal understanding. | Qwen3-VL, Qwen2.5-Omni, MedGemma, etc. |
| Image Models | Focus on image generation, editing, and recognition through computer vision techniques. | Qwen-Image, Wan, Flux, etc. |
| Video Models | Extend image modeling into temporal sequences, enabling video understanding and captioning. | Wan, CogVideo, HunyuanVideo, etc. |
Model Variants
| Model Variant | Description | Representative Examples |
|---|---|---|
| Base Models | General-purpose models trained on broad data without specialized fine-tuning. Serve as foundational backbones. | Qwen2.5-7B, gpt-oss-20b, etc. |
| Instruct Models | Fine-tuned with human instructions to better follow prompts and improve dialogue quality. | Qwen3-30B-A3B-Instruct-2507-FAST, Hunyuan-A13B-Instruct, etc. |
| Thinking Models | Designed for advanced reasoning, multi-step problem-solving, and complex cognitive tasks. | Qwen3-Next-80B-A3B-Thinking, DeepSeek-R1, etc. |
| Mixture of Experts (MOE) | Utilize dynamic routing across multiple expert subnetworks to balance model capacity and computational efficiency. | Qwen3.5, Qwen3-30B-A3B, etc. |
LLM
Large Language Models excel at natural language understanding, reasoning, and generation across diverse tasks and domains.
| Use Case | Description | Recommended Models |
|---|---|---|
| Code & Development | Generate code, debug, explain logic, assist programming | Open Source: Qwen3-Coder, GLM-4.7, DeepSeek-Coder, etc.API-Based: Qwen3-MAX, Qwen3.5-Plus, etc. |
| Agent | Build autonomous agents for multi-step tasks, tool use | Open Source: Qwen3-235B, DeepSeek-V3.2, GPT-OSS-120B, etc.API-Based: Qwen3-MAX, Qwen3.5-Plus, etc. |
| Reasoning | Perform multi-step logic, math reasoning, structured thinking | Open Source: Qwen3-235B, DeepSeek-R1-Distill-Llama-70B, GPT-OSS-120B, etc.API-Based: Qwen3-MAX, Qwen3.5-Plus, etc. |
| Conversation | Enable multi-turn dialogue for chatbots and assistants | Open Source: Qwen3-32B-FAST, GPT-OSS-20B, GLM-4.7, etc.API-Based: Qwen3.5-Flash, Qwen-Plus, etc. |
VLM
VLMs process images and text together for multimodal understanding tasks.
| Use Case | Description | Recommended Models |
|---|---|---|
| Visual Question Answering (VQA) | Answer complex questions based on images or videos, combining visual perception and language understanding. | Open Source: Qwen3-VL-32B-Thinking-FP8, MedGemma-27b-it, Qwen2.5-VL-72B-Instruct, etc.API-Based: Qwen3.5-Flash, Qwen3-VL-Plus, etc. |
| Image Captioning | Automatically generate descriptive captions for images to improve accessibility and content indexing. | Open Source: Qwen3-32B-FAST, Qwen3-VL-32B-Thinking-FP8, MedGemma-27b-it, etc.API-Based: Qwen3.5-Flash, Qwen3-VL-Plus, etc. |
| Real-time Video Conversation | Engage in spoken conversations about live video streams, enabling interactive, real-time analysis. | Open Source: Qwen2.5-Omni, Qwen3-VL, etc. API-Based: Qwen3.5-Flash, Qwen3-VL-Plus, etc. |
Image Model
Generate and edit images for business and creative applications.
| Use Case | Description | Recommended Models |
|---|---|---|
| Text to Image (T2I) | Generate detailed images directly from text prompts, ideal for creative design and content creation. | Open Source: Qwen-Image-2512, FLUX.1-dev, etc.API-Based: wan2.6-t2i, qwen-image-plus, etc. |
| Image to Image (I2I) | Modify existing images guided by textual input for style transfer or enhancement. | Open Source: Qwen-Image-Edit-2511, FLUX.1-Kontext-dev, etc.API-Based: wan2.6-image, wan2.5-i2i-preview, etc. |
| E-commerce Product Images | Generate marketing and product display images for e-commerce platforms. | Open Source: Z-Image-Turbo, Qwen-Image-Edit, etc.API-Based: wan2.6-image, wan2.5-i2i-preview, etc. |
| Social Media Content Creation | Create visual content for social media platforms and marketing campaigns. | Open Source: Qwen-Image, FLUX.1-Krea-dev, etc.API-Based: qwen-image-plus, wan2.6-t2i, etc. |
Video Model
Create videos from text descriptions or static images.
| Use Case | Description | Recommended Models |
|---|---|---|
| Text to Video (T2V) | Create videos based on textual descriptions. | Open Source: Wan2.2-T2V-A14B-Diffusers, LTX-Video-0.9.7-dev, etc.API-Based: wan2.6-t2v, wan2.5-t2v-preview, etc. |
| Image to Video (I2V) | Generate videos from static images with smooth movement synthesis. | Open Source: Wan2.2-I2V-A140B-Diffusers, Wan2.1-I2V-14B-720P-Diffusers, etc.API-Based: wan2.6-i2v, wan2.2-i2v-plus, etc. |
| E-commerce Product Ads | Create product advertisement videos from text or images. | Open Source: Wan2.1-I2V-14B-720P-Diffusers, Wan2.1-T2V-14B-Diffusers, etc.API-Based: wan2.2-i2v-plus, wan2.1-i2v-turbo, etc. |
| Social Media Snippets | Create short video content for social media platforms and marketing. | Open Source: Wan2.2-T2V-1.3B-Diffusers, Wan2.1-T2V-1.3B-Diffusers, etc.API-Based: wan2.6-i2v-flash, wan2.2-kf2v-flash, etc. |
| Character Consistency | Generate animated avatars and digital humans for videos with consistent character representation. | Open Source: Wan2.2-TI2V-5B-Diffusers, Wan2.1-I2V-14B-720P-Diffusers, etc.API-Based: wan2.6-r2v, wan2.6-r2v-flash, etc. |
For more models you can refer to Model Gallery.
Next Steps
Create a deployment endpoint for your model to start serving inference requests in production.
Use your model as a base for fine-tuning with your custom datasets to improve performance on specific tasks.