Deploying a GPU

Deploying GPU

TL;DR

Step 1: Start GPU deployment

  • Log in to NeevCloud.

  • Click Deploy your first GPU.

Step 2: Choose GPU configuration

  • Select a GPU suitable for running vLLM inference. Recommended options:

    • A10: For small models.

    • A100 40GB or 80GB: For large models.

    • H100: For maximum throughput.

Step 3: Search and select the template

  • Click the Explore Templates button.

  • Use the search bar and type vLLM.

  • Select vLLM Inference AI Template.

Step 4: Deploy the instance

  • Make sure SSH access is enabled.

  • Click Deploy.

  • The instance will start with vLLM already available.

Configuration Steps

Step 1: Pod Name

Your instance receives an auto-generated name to ensure uniqueness across the platform. This name helps you identify your instance in the dashboard, especially when you're managing multiple deployments.

  • Pod naming format: Typically follows a pattern like gpu-pytorch-a8f9g or similar unique identifiers.

Best practices:

  • Keep the auto-generated name for simplicity

  • Use descriptive tags or labels if managing multiple projects

  • Document your instance names in your project notes for reference

Step 2: GPU Template Selection

This is the most critical configuration step. Choose from over 20 pre-configured templates designed for AI/ML workloads.

Template selection process:

  1. Click the Change Template button in the configuration deployment section

  2. Browse templates by category (Framework, Type, Tool)

  3. Click on a template to view its detailed specifications

  4. Review included packages and versions

  5. Select the template that matches your requirements

Template details display: When you hover over or select a template, you'll see:

  • Operating system version

  • CUDA toolkit version

  • Framework version (PyTorch, TensorFlow, etc.)

  • Pre-installed libraries

  • Development environment (Jupyter, VS Code, etc.)

Why template choice matters: The template determines your entire software stack. Choosing correctly means:

  • Your code runs without dependency errors

  • GPU drivers match your framework requirements

  • Development tools are immediately available

  • No time wasted on environment setup

Common template scenarios:

Scenario 1: Fine-tuning a Hugging Face model

  • Choose: "PyTorch 2.1 + Transformers + PEFT"

  • Why: Includes PyTorch, Transformers, parameter-efficient fine-tuning libraries

Scenario 2: Training a computer vision model

  • Choose: "TensorFlow 2.14 + OpenCV"

  • Why: Includes TensorFlow, image processing libraries, visualization tools

Scenario 3: Generating images with Stable Diffusion

  • Choose: "Stable Diffusion + ComfyUI"

  • Why: Includes diffusion models, ComfyUI interface, necessary dependencies

Scenario 4: LLM inference serving

  • Choose: "vLLM + FastAPI"

  • Why: Optimized for inference, includes serving framework, model quantization

Step 3: SSH Key Configuration

Add your public SSH key to enable secure remote access to your instance. This key-based authentication is more secure than password authentication and is the standard method for accessing cloud GPU instances.

What is SSH key authentication? SSH keys work in pairs:

  • Private key: Stays on your local machine (never share this)

  • Public key: Uploaded to NeevCloud (safe to share)

When you connect, NeevCloud verifies your private key matches the public key on file.

How to find your SSH public key:

On Linux/Mac:

On Windows (PowerShell):

If you don't have an SSH key yet: Generate one with this command:

Follow the prompts (you can press Enter to accept defaults). This creates:

  • Private key: ~/.ssh/id_rsa

  • Public key: ~/.ssh/id_rsa.pub

Adding your SSH key to NeevCloud:

  1. Copy your entire public key (starts with ssh-rsa and ends with your email)

  2. Paste it into the "SSH Key" field in the configuration panel

  3. Verify the key is complete (missing characters will prevent access)

Multiple SSH keys: You can add multiple public keys if:

  • You access from different machines

  • Multiple team members need access

  • You have separate keys for different security contexts

Step 4: Review and Deploy

Before deploying, review your complete configuration:

Instance specifications:

  • GPU type and VRAM

  • Region and availability zone

  • Pricing model (on-demand or reserved)

Template selection:

  • Framework and version

  • CUDA compatibility

  • Included tools and libraries

Access configuration:

  • SSH key properly added

  • Pod name noted for reference

Estimated costs:

  • Hourly rate displayed

  • Projected monthly cost (if running 24/7)

Click "Deploy GPU" when you're satisfied with your configuration.

Deployment Process

After clicking "Deploy GPU", NeevCloud:

  1. Provisions hardware: Allocates physical GPU resources

  2. Loads template: Deploys your selected software environment

  3. Configures networking: Sets up SSH access and ports

  4. Starts services: Launches Jupyter/ComfyUI if included in template

  5. Reports ready: Updates dashboard with access information

Total deployment time: 10-20 seconds on average.

Accessing Your Deployed Instance

Once deployment completes, navigate to Dashboard → Active Instances to access your new GPU instance.

Information displayed for each instance:

  • Instance name and ID

  • GPU type and template used

  • Running time and current costs

  • Public IP address and SSH port

  • Access buttons (SSH, Jupyter, ComfyUI)

  • Management options (Stop, Delete, Snapshot)

Immediate next steps:

  1. Test connectivity: Click "SSH" to verify you can connect

  2. Access Jupyter: If included in template, click "Jupyter" to open notebook

  3. Verify GPU: Run nvidia-smi to confirm GPU is available

  4. Start working: Upload data, clone repositories, begin training

Pro Tips for Deployment Configuration

Tip 1: Match templates to your workflow exactly Don't pick a generic template if a specialized one exists. The "PyTorch + Transformers + PEFT" template saves hours compared to installing these libraries manually on a base PyTorch image.

Tip 2: Verify CUDA compatibility before deploying Check your existing code's CUDA requirements. If your code was developed with CUDA 11.8, choose a template with matching CUDA version to avoid compatibility issues.

Tip 3: Use reserved instances for development environments If you're working on a project for several weeks, reserve a smaller instance as your development environment, then use on-demand for larger training runs.

Tip 4: Keep SSH keys organized Name your SSH keys descriptively:

This helps when managing multiple instances across projects.

Tip 5: Deploy test instances first For critical workloads, deploy a small test instance with your chosen template first. Verify your code runs correctly before launching expensive, long-running training jobs.

Managing Your GPU Instances

Accessing Your Instance Management Dashboard

Once you've deployed your GPU instances, you need a centralized location to manage them. Navigate to Dashboard → GPU Instances to view and control all your active instances.

Dashboard overview: Your dashboard displays:

  • Active instances: Currently running GPU instances

  • Instance details: Specs, runtime, costs, access information

  • Quick actions: Delete, SSH etc

Available Management Actions

From your dashboard, you can perform several key operations:

1. Delete Instances

Remove instances you no longer need. This action:

  • Stops billing for that instance immediately

  • Releases the GPU resources back to the pool

  • Removes all data stored on the instance (unless you've created snapshots)

  • Frees up your account quota for new instances

When to delete:

  • Completed training jobs with results already downloaded

  • Temporary instances used for one-time tasks

  • Replaced instances after upgrading configuration

  • Test instances no longer needed

Important warning: Always ensure you've saved important data before deleting. Data on the instance is NOT recoverable after deletion unless you've:

  • Created a snapshot of the instance

  • Copied data to external storage (S3, your local machine)

  • Used persistent volumes

2. SSH into Existing Instances

Connect to your running instances for command-line access. This action:

  • Opens a secure shell connection to your GPU instance

  • Gives you full terminal access

  • Allows you to run commands, manage files, and monitor processes

  • Enables direct interaction with training jobs

We'll cover SSH in detail in the next section.

3. Access Web Interfaces

Many templates include web-based interfaces that you can access directly from your dashboard:

Jupyter Notebook/JupyterLab:

  • Click the "Jupyter" button on your instance

  • Opens in a new browser tab

  • Provides interactive development environment

  • Access files, notebooks, and terminal within the interface

ComfyUI (for Stable Diffusion templates):

  • Click the "ComfyUI" button on your instance

  • Opens the visual workflow interface

  • Build and run diffusion pipelines graphically

  • Generate images without writing code

VS Code Server (if included in template):

  • Full VS Code IDE in your browser

  • Edit code with familiar interface

  • Integrated terminal for running commands

  • Extensions and settings sync

4. Monitor Resource Usage

Track your instance's performance and utilization:

  • View GPU utilization percentage

  • Monitor VRAM usage

  • Check CPU and system memory

  • See network I/O and storage usage

This helps you:

  • Verify your code is using the GPU efficiently

  • Identify bottlenecks in your pipeline

  • Optimize resource allocation

  • Ensure you're not paying for underutilized resources

Automatic Environment Setup

When your GPU instance boots, NeevCloud automatically launches services based on your selected template. This automatic setup provides immediate benefits:

Jupyter Notebook auto-launch

For templates including Jupyter, the notebook server starts automatically on boot. This means:

  • No manual configuration needed

  • Immediate access via the dashboard "Jupyter" button

  • Pre-configured to work with the GPU

  • Common libraries already imported in the environment

You can start working immediately:

ComfyUI auto-launch

For generative AI templates, ComfyUI starts automatically, providing:

  • Visual workflow editor for diffusion models

  • Pre-loaded model nodes and components

  • Immediate image generation capability

  • No command-line setup required

Development environment ready

All templates include:

  • GPU drivers loaded and initialized

  • CUDA environment variables set correctly

  • Framework-specific optimizations enabled

  • Common directories created (/workspace, /data, /models)

Best Practices for Instance Management

Monitor your active instances regularly

Check your dashboard daily or weekly to:

  • Identify instances you forgot to stop or delete

  • Verify expected workloads are running

  • Catch unexpected cost increases early

  • Spot performance issues or errors

Use meaningful naming and tagging

While NeevCloud auto-generates instance names, you can add notes or tags:

  • Project name: "gpt-finetuning-prod"

  • Purpose: "training-run-experiment-5"

  • Team: "research-team-nlp"

This helps when managing multiple instances across different projects.

Implement shutdown automation

For training jobs with defined completion, use scripts to automatically stop instances:

This prevents you from forgetting to stop instances and incurring unnecessary costs.

Create snapshots before major changes

Before upgrading packages, modifying configurations, or running experimental code:

  1. Create a snapshot of your current instance

  2. Make your changes

  3. If something breaks, deploy a new instance from the snapshot

  4. Delete the broken instance

This workflow prevents data loss and minimizes downtime.

Use persistent storage for critical data

Don't rely solely on instance storage for important data:

  • Use external storage (S3, cloud storage) for datasets

  • Regularly download model checkpoints

  • Version control your code in Git

  • Export results and logs periodically

Instance storage is ephemeral—it disappears when you delete the instance.

Optimize instance selection over time

As your project evolves:

  1. Start with larger instances for initial experiments

  2. Profile your actual resource usage

  3. Downsize to more cost-effective instances for production

  4. Use smaller instances for inference vs. training

Right-sizing saves significant costs without sacrificing performance.

Managing Multiple Instances

When running several instances simultaneously:

Organize by project or purpose:

  • Development instances: Smaller, on-demand for coding and testing

  • Training instances: Larger, reserved for long-running jobs

  • Inference instances: Optimized for serving, potentially reserved

Implement cost allocation:

  • Tag instances by project for accounting

  • Track costs per team or department

  • Set budget limits per project

Coordinate resource usage:

  • Schedule training jobs to avoid peak pricing periods

  • Share instances among team members when appropriate

  • Consolidate workloads on fewer instances when possible

Last updated