Monitoring & Analytics

Overview Metrics

Model API

Shows how many model APIs you currently have running. Each model you've selected counts as one active endpoint. If you're using three different models, you'll see "3 Endpoints."

This metric helps you keep track of which models are consuming resources. If you see more endpoints than expected, you may have forgotten to disable a test model.

Tokens Used

Displays the total number of tokens processed across all your models, broken down into:

  • Input tokens – Text you sent to the models (prompts, context, documents)

  • Output tokens – Text the models generated in response

Token counts directly determine your costs. Monitoring this metric helps you understand:

  • Which applications are consuming the most tokens

  • Whether usage is increasing or decreasing over time

  • If you need to optimize prompts to reduce token consumption

Total Requests

The cumulative number of API calls made to all your models. This metric shows:

  • How actively your applications are using AI

  • Traffic patterns (peak times, quiet periods)

  • Growth trends in AI feature adoption

A sudden spike in requests might indicate a successful feature launch or potentially a bug causing excessive API calls.

Current Cost

Your spending based on usage so far in the current billing period. This updates in real-time as you make requests, showing exactly what you'll be billed for.

Watch this metric to:

  • Stay within budget constraints

  • Identify cost-intensive applications or features

  • Make informed decisions about model selection (switching to a smaller, cheaper model if costs are too high)

Projected Cost

An estimate of your total monthly spending if usage continues at the current rate. This projection helps with:

  • Budget planning and approval

  • Identifying potential overspend situations early

  • Capacity planning for scaling applications

For example, if you're running a pilot program and projected costs are higher than expected, you can optimize before full rollout.

Average Uptime

The reliability metric showing what percentage of time your endpoints have been available and responsive. NeevCloud typically maintains 99.9%+ uptime, meaning less than 43 minutes of downtime per month.

Monitor this to:

  • Verify SLA compliance

  • Detect any reliability issues with specific models

  • Make informed decisions about production readiness

Endpoint Column

The unique API URL for each model. You can copy this URL to:

  • Update application configurations

  • Share with team members who need to integrate the model

  • Verify you're using the correct endpoint in your code

Requests Column

Shows how many API calls have been made to this specific model. Use this to:

  • Identify your most-used models

  • Find models that aren't being used (candidates for disabling)

  • Compare traffic across different models or applications

Tokens In / Out Columns

Detailed token usage for this specific model:

  • Tokens In – Total input tokens sent to the model

  • Tokens Out – Total output tokens generated by the model

Since output tokens typically cost more than input tokens, pay attention to this ratio. If you're seeing much higher output than input, consider:

  • Reducing max_tokens parameter to limit response length

  • Using a model with lower output token pricing

  • Optimizing prompts to request more concise responses

Cost (30d) Column

Total spending on this model over the last 30 days. This helps you:

  • Identify your most expensive models

  • Compare costs across different model sizes

  • Decide if upgrading to a larger model (better quality, higher cost) or downgrading to a smaller model (lower quality, lower cost) makes sense

Last Used Column

Timestamp showing when this model last processed a request. This helps you:

  • Identify abandoned or forgotten models still consuming resources

  • Verify that applications are actively using the models

  • Determine which models can be safely disabled

If a model shows "Last Used: 15 days ago," it might be safe to disable unless you're keeping it for emergency backup.

Using Monitoring Data for Optimization

Cost Optimization

If costs are higher than expected:

  • Identify your most expensive models in the table

  • Check if smaller models could handle those workloads

  • Look at tokens in/out ratios—high output token counts suggest response length optimization opportunities

  • Review request patterns—are there redundant or unnecessary API calls?

Performance Optimization

If response times are slow:

  • Check the "Status" column—models stuck in "Loading" state may indicate scaling issues

  • Review request volumes—very high concurrent requests might benefit from rate limiting

  • Consider model size—smaller models respond faster but with lower quality

Reliability Monitoring

If you notice intermittent failures:

  • Check "Average Uptime" metric for overall platform health

  • Look for "Failed" status on specific models

  • Review "Last Used" timestamps—models that are rarely used may unload and take longer to respond when needed

This monitoring data gives you complete visibility into your AI infrastructure without requiring you to manage any monitoring tools or infrastructure yourself.

Last updated