Getting Started
What is Model API?
NeevCloud Model API provides uniform access to state-of-the-art open-source AI models through a standardized, OpenAI-compatible API. Instead of managing complex infrastructure, handling GPU scaling, or optimizing model weights, you can simply make an API call to run inference on top-tier models like Llama 3, Mixtral, Qwen, and Stable Diffusion.
How It Works
The Model API abstracts away the underlying hardware complexity. When you send a request:
Authentication: Your request is verified using your API key.
Routing: The request is instantly routed to a warm, optimized GPU instance hosting your selected model.
Inference: The model processes the input prompt and generates the output.
Response: The result is streamed back to your application in real-time.
This approach allows you to scale from zero to millions of requests without managing a single GPU instance.
Choosing the Right Model
Different models are optimized for different tasks. The task filter helps you narrow down models based on what you're building:
Chat – Conversational AI, customer support bots, virtual assistants
Coding – Code generation, code completion, debugging assistance
Image Generation – Creating images from text prompts, design automation
Vision – Image understanding, object detection, visual question answering
Audio – Speech recognition, audio transcription, voice synthesis
Embeddings – Semantic search, recommendation systems, clustering
Moderation – Content filtering, safety classification, toxicity detection
The "Order By" feature helps you discover models based on different criteria:
Most Recent – Shows newly added models first. Useful for staying current with the latest model releases and improvements.
Most Deployed – Displays popular models that other users are actively using. This can indicate reliability and community validation.
Price (Lowest to Highest) – Sorts by cost per token. Essential for budget-conscious projects or high-volume applications where token costs accumulate quickly.
Price (Highest to Lowest) – Shows premium models first. Often these are the largest, most capable models for tasks requiring maximum quality.
The real power comes from combining multiple filters. For example:
Filter by "Chat" task + "7B-13B" parameters + "Price Low to High" to find cost-effective conversational models
Filter by "Coding" task + "Most Deployed" + "Official" to find proven code generation models
Filter by "Vision" task + "Most Recent" to explore the latest image understanding capabilities
This approach helps you quickly narrow thousands of models down to a handful of suitable candidates.
Once you've identified a candidate model, click the Select button.
Technical Capabilities
This detail section provides what the model can do: supported input formats, maximum context length (how much text it can process at once), supported languages, and any special capabilities like function calling or structured output generation.
For example, you might see:
Context window: 128K tokens (can process roughly 96,000 words of input)
Supports JSON mode for structured outputs
Multilingual support for 50+ languages
Function calling capability for tool use
Understanding these capabilities helps you determine if the model fits your technical requirements.
Pricing Structure
Each model has specific pricing for input and output tokens. You'll see rates like:
Input tokens: $0.50 per million tokens
Output tokens: $1.50 per million tokens
Output tokens typically cost more because generating text is computationally more expensive than processing it. Use these rates to estimate your costs based on expected usage patterns.
Playground Access
Before writing code, you can test the model interactively in the Playground. This browser-based interface lets you:
Send prompts and see responses immediately
Adjust parameters like temperature and max tokens
Test different input formats
Validate that the model behaves as expected
The Playground is invaluable for quick experimentation without writing any code.
API Endpoint and Code Samples
Once you're satisfied with the model's performance, you can copy:
The unique API endpoint URL for this model
Ready-to-use code examples in cURL, Python, and JavaScript
Your API key (if already created)
These code samples are pre-configured with the correct model name and endpoint, so you can paste them into your application with minimal modification.
Last updated