Skip to main content

Create Deployment

Deploy models from Model Gallery, upload your own models, or use existing fine-tuned models with full configuration control.

Step 1: Select a Model

Choose a base model or fine-tuned model to deploy from a variety of sources to deploy.

Model Selection Options

Browse Model Gallery

Explore our curated collection of pre-trained models from leading AI providers.

Available Models: Qwen, meta-llama, openai, deepseek-ai, zai-org, Alibaba-NLP, tencent, moonshotai, google, internlm, and more

Use Your Own Models
Deploy models you've uploaded or fine-tuned on the platform.

Configuration Options

Selecting a Model

Enable speculative decoding for faster inference

  • Base Models: Choose a base model that best suits your use case, Consider the model's capabilities and limitations, Review model specifications (parameters, context window, etc.)
  • Fine-Tuned Models: Custom models trained on the platform based on the Base Model.

Display name

A descriptive name to help you identify the deployment on the dashboard. (within 64 characters)

Configuration Options

Step 2: Configure Resources

Select the appropriate compute resources and deployment settings for your model.

Contact for a discount >

Resource Configuration

Region Selection

Choose the deployment region based on your users' location for optimal latency.

Available Regions: Singapore, Japan, USA-East, Indonesia, Frankfurt, Hong Kong, Malaysia

GPU Type Selection

Select a GPU type based on your model's requirements and performance needs.

Options: NVIDIA A10, L20, etc.

Accelerator Count

Number of accelerators to use per replica (Automatically recommended based on model size).

Replicas

Number of replicas to deploy for load balancing and high availability.

Resource Configuration

Step 3: Review and Deploy

Review your configuration and cost summary before creating the deployment.

Cost Summary

  • GPU Compute Cost
  • System Overhead (currently $0)
Note

Model download and storage costs are not included in the Cost Summary.

Troubleshoot

1. Error 403: Sales of this resource are temporarily suspended

403: Sales of this resource are temporarily suspended.

  • Reason: The selected GPU type is temporarily unavailable in the current region due to high demand and insufficient resources.
  • Solution:
    1. Try deploying the model in a different region where resources may be available.
    2. If the issue persists, please contact our support team for assistance.

2. Error: Account has an outstanding balance

Account has an outstanding balance.

  • Reason: Your account balance is insufficient to cover the cost of creating or running the deployment.
  • Solution: Navigate to the Billing section of your account dashboard and add funds to your balance.

3. Error: Your account information is incomplete

Your account information is incomplete.

  • Reason: Your account has not completed the required identity or information verification process.
  • Solution: Navigate to the Alibaba Cloud Account Settings page and complete the required identity verification.

Next Steps

Use Secrets for API Calls

Create API keys for programmatic access to your deployed models.

Try Model Lab

Test and experiment with your deployed models in our interactive playground.

Post-Training

Post-train your models with custom data for better performance.