Getting Started

What is Model API?

NeevCloud Model API provides uniform access to state-of-the-art open-source AI models through a standardized, OpenAI-compatible API. Instead of managing complex infrastructure, handling GPU scaling, or optimizing model weights, you can simply make an API call to run inference on top-tier models like Llama 3, Mixtral, Qwen, and Stable Diffusion.

How It Works

The Model API abstracts away the underlying hardware complexity. When you send a request:

  1. Authentication: Your request is verified using your API key.

  2. Routing: The request is instantly routed to a warm, optimized GPU instance hosting your selected model.

  3. Inference: The model processes the input prompt and generates the output.

  4. Response: The result is streamed back to your application in real-time.

This approach allows you to scale from zero to millions of requests without managing a single GPU instance.

Choosing the Right Model

Different models are optimized for different tasks. The task filter helps you narrow down models based on what you're building:

  • Chat – Conversational AI, customer support bots, virtual assistants

  • Coding – Code generation, code completion, debugging assistance

  • Image Generation – Creating images from text prompts, design automation

  • Vision – Image understanding, object detection, visual question answering

  • Audio – Speech recognition, audio transcription, voice synthesis

  • Embeddings – Semantic search, recommendation systems, clustering

  • Moderation – Content filtering, safety classification, toxicity detection

The "Order By" feature helps you discover models based on different criteria:

  • Most Recent – Shows newly added models first. Useful for staying current with the latest model releases and improvements.

  • Most Deployed – Displays popular models that other users are actively using. This can indicate reliability and community validation.

  • Price (Lowest to Highest) – Sorts by cost per token. Essential for budget-conscious projects or high-volume applications where token costs accumulate quickly.

  • Price (Highest to Lowest) – Shows premium models first. Often these are the largest, most capable models for tasks requiring maximum quality.

The real power comes from combining multiple filters. For example:

  • Filter by "Chat" task + "7B-13B" parameters + "Price Low to High" to find cost-effective conversational models

  • Filter by "Coding" task + "Most Deployed" + "Official" to find proven code generation models

  • Filter by "Vision" task + "Most Recent" to explore the latest image understanding capabilities

This approach helps you quickly narrow thousands of models down to a handful of suitable candidates.

Once you've identified a candidate model, click the Select button.

Technical Capabilities

This detail section provides what the model can do: supported input formats, maximum context length (how much text it can process at once), supported languages, and any special capabilities like function calling or structured output generation.

For example, you might see:

  • Context window: 128K tokens (can process roughly 96,000 words of input)

  • Supports JSON mode for structured outputs

  • Multilingual support for 50+ languages

  • Function calling capability for tool use

Understanding these capabilities helps you determine if the model fits your technical requirements.

Pricing Structure

Each model has specific pricing for input and output tokens. You'll see rates like:

  • Input tokens: $0.50 per million tokens

  • Output tokens: $1.50 per million tokens

Output tokens typically cost more because generating text is computationally more expensive than processing it. Use these rates to estimate your costs based on expected usage patterns.

Playground Access

Before writing code, you can test the model interactively in the Playground. This browser-based interface lets you:

  • Send prompts and see responses immediately

  • Adjust parameters like temperature and max tokens

  • Test different input formats

  • Validate that the model behaves as expected

The Playground is invaluable for quick experimentation without writing any code.

API Endpoint and Code Samples

Once you're satisfied with the model's performance, you can copy:

  • The unique API endpoint URL for this model

  • Ready-to-use code examples in cURL, Python, and JavaScript

  • Your API key (if already created)

These code samples are pre-configured with the correct model name and endpoint, so you can paste them into your application with minimal modification.

Last updated