Monitoring & Analytics

Overview Metrics

Model API

Shows how many model APIs you currently have running. Each model you've selected counts as one active endpoint. If you're using three different models, you'll see "3 Endpoints."

This metric helps you keep track of which models are consuming resources. If you see more endpoints than expected, you may have forgotten to disable a test model.

Tokens Used

Displays the total number of tokens processed across all your models, broken down into:

Input tokens – Text you sent to the models (prompts, context, documents)
Output tokens – Text the models generated in response

Token counts directly determine your costs. Monitoring this metric helps you understand:

Which applications are consuming the most tokens
Whether usage is increasing or decreasing over time
If you need to optimize prompts to reduce token consumption

Total Requests

The cumulative number of API calls made to all your models. This metric shows:

How actively your applications are using AI
Traffic patterns (peak times, quiet periods)
Growth trends in AI feature adoption

A sudden spike in requests might indicate a successful feature launch or potentially a bug causing excessive API calls.

Current Cost

Your spending based on usage so far in the current billing period. This updates in real-time as you make requests, showing exactly what you'll be billed for.

Watch this metric to:

Stay within budget constraints
Identify cost-intensive applications or features
Make informed decisions about model selection (switching to a smaller, cheaper model if costs are too high)

Projected Cost

An estimate of your total monthly spending if usage continues at the current rate. This projection helps with:

Budget planning and approval
Identifying potential overspend situations early
Capacity planning for scaling applications

For example, if you're running a pilot program and projected costs are higher than expected, you can optimize before full rollout.

Average Uptime

The reliability metric showing what percentage of time your endpoints have been available and responsive. NeevCloud typically maintains 99.9%+ uptime, meaning less than 43 minutes of downtime per month.

Monitor this to:

Verify SLA compliance
Detect any reliability issues with specific models
Make informed decisions about production readiness

Endpoint Column

The unique API URL for each model. You can copy this URL to:

Update application configurations
Share with team members who need to integrate the model
Verify you're using the correct endpoint in your code

Requests Column

Shows how many API calls have been made to this specific model. Use this to:

Identify your most-used models
Find models that aren't being used (candidates for disabling)
Compare traffic across different models or applications

Tokens In / Out Columns

Detailed token usage for this specific model:

Tokens In – Total input tokens sent to the model
Tokens Out – Total output tokens generated by the model

Since output tokens typically cost more than input tokens, pay attention to this ratio. If you're seeing much higher output than input, consider:

Reducing max_tokens parameter to limit response length
Using a model with lower output token pricing
Optimizing prompts to request more concise responses

Cost (30d) Column

Total spending on this model over the last 30 days. This helps you:

Identify your most expensive models
Compare costs across different model sizes
Decide if upgrading to a larger model (better quality, higher cost) or downgrading to a smaller model (lower quality, lower cost) makes sense

Last Used Column

Timestamp showing when this model last processed a request. This helps you:

Identify abandoned or forgotten models still consuming resources
Verify that applications are actively using the models
Determine which models can be safely disabled

If a model shows "Last Used: 15 days ago," it might be safe to disable unless you're keeping it for emergency backup.

Using Monitoring Data for Optimization

Cost Optimization

If costs are higher than expected:

Identify your most expensive models in the table
Check if smaller models could handle those workloads
Look at tokens in/out ratios—high output token counts suggest response length optimization opportunities
Review request patterns—are there redundant or unnecessary API calls?

Performance Optimization

If response times are slow:

Check the "Status" column—models stuck in "Loading" state may indicate scaling issues
Review request volumes—very high concurrent requests might benefit from rate limiting
Consider model size—smaller models respond faster but with lower quality

Reliability Monitoring

If you notice intermittent failures:

Check "Average Uptime" metric for overall platform health
Look for "Failed" status on specific models
Review "Last Used" timestamps—models that are rarely used may unload and take longer to respond when needed

This monitoring data gives you complete visibility into your AI infrastructure without requiring you to manage any monitoring tools or infrastructure yourself.

PreviousAPI Reference NextPricing & Cost Optimization

Last updated 21 days ago

Good night

hashtagOverview Metrics

hashtagModel API

hashtagTokens Used

hashtagTotal Requests

hashtagCurrent Cost

hashtagProjected Cost

hashtagAverage Uptime

hashtagEndpoint Column

hashtagRequests Column

hashtagTokens In / Out Columns

hashtagCost (30d) Column

hashtagLast Used Column

hashtagUsing Monitoring Data for Optimization

hashtagCost Optimization

hashtagPerformance Optimization

hashtagReliability Monitoring