Pricing & Cost Optimization
Pricing Model
NeevCloud uses transparent, usage-based pricing that aligns costs with actual consumption. You're never paying for idle resources or overprovisioned capacity.
Pay-Per-Token Structure
How Token Billing Works
Tokens are the fundamental unit of text processing in language models. Roughly speaking:
1 token ≈ 4 characters of English text
1 token ≈ ¾ of a word
100 tokens ≈ 75 words
When you make a request:
Your input (prompt, context, etc.) is counted as input tokens
The model's response is counted as output tokens
You're billed for both, with output tokens typically costing more
Example Calculation
Let's say a model charges:
$0.50 per million input tokens
$1.50 per million output tokens
If you send a 1,000-token prompt and receive a 500-token response:
Input cost: (1,000 / 1,000,000) × $0.50 = $0.0005
Output cost: (500 / 1,000,000) × $1.50 = $0.00075
Total cost per request: $0.00125
For 10,000 similar requests per day, your monthly cost would be approximately $375.
Why Output Tokens Cost More
Generating text requires more computation than processing it. When you send input tokens, the model:
Reads and encodes the text
Builds internal representations
When the model generates output tokens, it:
Runs complex neural network calculations for each token
Samples from probability distributions
Maintains coherence across the entire response
Performs this iteratively for each output token
This is why output tokens typically cost 2-3x more than input tokens.
No Hidden Fees
What's Included in Token Pricing
Your per-token cost covers:
GPU compute time for inference
Model loading and memory allocation
Load balancing and autoscaling
Redundancy and failover
Monitoring and logging infrastructure
API gateway and networking
Data transfer and bandwidth
You don't pay separately for any infrastructure components.
What You Don't Pay For
Idle time between requests
Model deployment or setup
Minimum usage commitments
Infrastructure maintenance
Failed requests (you're only charged for successful requests that return tokens)
No Minimum Commitments
Complete Flexibility
You can:
Start using a model with a single request
Scale up to millions of requests per day
Scale back down to zero requests
Enable or disable models at any time
Switch between models without penalties
There are no contracts, no reserved capacity requirements, and no charges when you're not making requests.
Ideal for Variable Workloads
This model works well when:
Testing multiple models to find the best fit
Running pilot programs with uncertain demand
Building applications with unpredictable usage patterns
Handling seasonal or event-driven traffic spikes
Real-Time Billing Visibility
Dashboard Transparency
Your NeevCloud dashboard shows:
Current costs updating as requests complete
Per-model cost breakdown
Token usage trends over time
Projected monthly costs based on current usage patterns
This visibility helps you:
Stay within budget constraints
Identify cost spikes immediately
Make data-driven decisions about model selection and optimization
No Billing Surprises
You'll never receive an unexpected bill. The dashboard shows exactly what you'll be charged before the invoice is generated.
Cost-Efficient Scaling
Automatic Optimization
As your usage increases, the per-token efficiency improves because:
Models stay loaded in memory (no repeated loading costs)
Request batching optimizes GPU utilization
Infrastructure is fully utilized, reducing per-request overhead
Right-Sizing Your Usage
Use the monitoring tools to optimize costs:
If quality meets your needs with a smaller model, switch to save costs
If response times are critical, upgrade to a larger model despite higher costs
If output length is excessive, add
max_tokenslimits to control costs
The pay-per-token model gives you complete control over the cost-quality-performance tradeoff.
Last updated