Skip to main content

Usage

Monitor and analyze your AI platform usage across deployments, routers, and request history. This comprehensive billing dashboard provides detailed insights into resource consumption, performance metrics, and cost optimization opportunities.

Self-Deployed Models Usage

The Deployments section provides comprehensive monitoring and billing information for your AI model deployments.


Deployment


Key Metrics Overview

  • Total Requests: The cumulative number of API requests made to all your deployments within the selected time range.
  • Active Deployments: The current number of running deployments across your account.
  • Allocated GPU Hours: The total GPU compute hours allocated to your deployments.
  • Fine-Tunings: The number of fine-tuning operations that are completed or in progress.

Deployment Details Table

NameDescription
NameA unique identifier for each deployment.
TypeThe category of the deployment: Deployment, Fine-Tuning.
ModelThe specific AI model being deployed.
RequestsThe total number of API calls processed by the deployment during the selected time period.
DurationThe total runtime of the deployment, displayed in hours and minutes.
Tokens (est.)The estimated total number of tokens consumed by the deployed models.

API-Based Models Usage (Router)

The Routers section provides insights into serverless AI model usage and routing performance.


Router


Router Metrics Overview

  • Total Requests: The total number of API requests processed by the router.
  • Total Tokens (est.): The estimated total number of tokens consumed by all router requests.
  • Used Models: The number of unique AI models accessed through the router during the selected time period.
  • Average Latency: The average response latency for router requests measured in seconds.

Serverless Usage Details

NameDescription
Model NameThe specific AI model identifier used for serverless inference.
API KeyThe API key used to make the request.
RequestsThe number of API calls made to this specific model through the router.
Total Tokens (est.)The estimated token consumption for this model, including both input and output tokens.
InputDetailed breakdown of input token usage by pricing category (e.g., prompt tokens, cached tokens, audio tokens). Each row displays the factor label, count, and unit.
OutputDetailed breakdown of output token usage by pricing category (e.g., completion tokens, reasoning tokens, image tokens). Each row displays the factor label, count, and unit.
Avg Latency (s)The average response latency for this specific model, measured in seconds.
Success Rate (%)The percentage of successful requests for this model.
Error CountThe total number of failed requests for this model during the selected time period.
Total FeeThe total cost incurred for this model during the selected time period, displayed in USD.
Throughput (QPS)The number of queries processed per second by this model.
TimeThe time bucket for this usage record, displayed in hourly intervals (e.g., 2026-04-21 14:00).

Export Usage Data

Click the Export button in the upper-right corner to download your router usage data as an Excel file. The export runs asynchronously — a loading indicator appears while the file is generated, and the download starts automatically once ready.

History

The History section provides detailed, request-level tracking and auditing capabilities. Here, you can search, filter, and analyze individual API requests to understand usage patterns, troubleshoot issues, and optimize your AI applications.


History


Request Details Table

NameDescription
Request IDThe unique identifier for each API request.
Request TypeThe category of the AI operation performed (e.g., "AI Datasets Preparation," "AI Model Recommend").
Model (Name/Version)The specific AI model and version used for the request.
ProvidersThe service provider that processed the request.
TechnologyThe AI technology category (e.g., LLM, VLM) used for the request.
Input TokensThe number of input tokens sent to the model.
Output TokensThe number of output tokens generated by the model.
Total TokensThe total token consumption for the request, including both input and output tokens.
TTFTTime to First Token - the time in milliseconds until the first token is generated.
Gateway Latency (ms)The time taken for the request to pass through the gateway, measured in milliseconds.
TimestampThe exact date and time when the request was processed.
Status CodeThe HTTP status code that indicates whether the request was successful (e.g., 200) or if it failed.
Error MessageThe detailed error information for a failed request.
Usage Optimization Tips
  • Monitor utilization percentages to optimize resource allocation and reduce costs
  • Use the History section to identify peak usage times and plan capacity accordingly
  • Track error rates and latency to ensure optimal user experience
  • Compare token usage across different models to optimize cost-performance ratios

Next Steps

View Billing

Check your billing summary, payment methods, and cost breakdown by service category.

Manage Deployments

Review and manage your active model deployments to optimize resource usage.