Monitoring & Analytics
Overview Metrics
Model API
Shows how many model APIs you currently have running. Each model you've selected counts as one active endpoint. If you're using three different models, you'll see "3 Endpoints."
This metric helps you keep track of which models are consuming resources. If you see more endpoints than expected, you may have forgotten to disable a test model.
Tokens Used
Displays the total number of tokens processed across all your models, broken down into:
Input tokens – Text you sent to the models (prompts, context, documents)
Output tokens – Text the models generated in response
Token counts directly determine your costs. Monitoring this metric helps you understand:
Which applications are consuming the most tokens
Whether usage is increasing or decreasing over time
If you need to optimize prompts to reduce token consumption
Total Requests
The cumulative number of API calls made to all your models. This metric shows:
How actively your applications are using AI
Traffic patterns (peak times, quiet periods)
Growth trends in AI feature adoption
A sudden spike in requests might indicate a successful feature launch or potentially a bug causing excessive API calls.
Current Cost
Your spending based on usage so far in the current billing period. This updates in real-time as you make requests, showing exactly what you'll be billed for.
Watch this metric to:
Stay within budget constraints
Identify cost-intensive applications or features
Make informed decisions about model selection (switching to a smaller, cheaper model if costs are too high)
Projected Cost
An estimate of your total monthly spending if usage continues at the current rate. This projection helps with:
Budget planning and approval
Identifying potential overspend situations early
Capacity planning for scaling applications
For example, if you're running a pilot program and projected costs are higher than expected, you can optimize before full rollout.
Average Uptime
The reliability metric showing what percentage of time your endpoints have been available and responsive. NeevCloud typically maintains 99.9%+ uptime, meaning less than 43 minutes of downtime per month.
Monitor this to:
Verify SLA compliance
Detect any reliability issues with specific models
Make informed decisions about production readiness
Endpoint Column
The unique API URL for each model. You can copy this URL to:
Update application configurations
Share with team members who need to integrate the model
Verify you're using the correct endpoint in your code
Requests Column
Shows how many API calls have been made to this specific model. Use this to:
Identify your most-used models
Find models that aren't being used (candidates for disabling)
Compare traffic across different models or applications
Tokens In / Out Columns
Detailed token usage for this specific model:
Tokens In – Total input tokens sent to the model
Tokens Out – Total output tokens generated by the model
Since output tokens typically cost more than input tokens, pay attention to this ratio. If you're seeing much higher output than input, consider:
Reducing
max_tokensparameter to limit response lengthUsing a model with lower output token pricing
Optimizing prompts to request more concise responses
Cost (30d) Column
Total spending on this model over the last 30 days. This helps you:
Identify your most expensive models
Compare costs across different model sizes
Decide if upgrading to a larger model (better quality, higher cost) or downgrading to a smaller model (lower quality, lower cost) makes sense
Last Used Column
Timestamp showing when this model last processed a request. This helps you:
Identify abandoned or forgotten models still consuming resources
Verify that applications are actively using the models
Determine which models can be safely disabled
If a model shows "Last Used: 15 days ago," it might be safe to disable unless you're keeping it for emergency backup.
Using Monitoring Data for Optimization
Cost Optimization
If costs are higher than expected:
Identify your most expensive models in the table
Check if smaller models could handle those workloads
Look at tokens in/out ratios—high output token counts suggest response length optimization opportunities
Review request patterns—are there redundant or unnecessary API calls?
Performance Optimization
If response times are slow:
Check the "Status" column—models stuck in "Loading" state may indicate scaling issues
Review request volumes—very high concurrent requests might benefit from rate limiting
Consider model size—smaller models respond faster but with lower quality
Reliability Monitoring
If you notice intermittent failures:
Check "Average Uptime" metric for overall platform health
Look for "Failed" status on specific models
Review "Last Used" timestamps—models that are rarely used may unload and take longer to respond when needed
This monitoring data gives you complete visibility into your AI infrastructure without requiring you to manage any monitoring tools or infrastructure yourself.
Last updated