Deploying a GPU

Deploying GPU

TL;DR

Step 1: Start GPU deployment

Log in to NeevCloud.
Click Deploy your first GPU.

Step 2: Choose GPU configuration

Select a GPU suitable for running vLLM inference. Recommended options:
- A10: For small models.
- A100 40GB or 80GB: For large models.
- H100: For maximum throughput.

Step 3: Search and select the template

Click the Explore Templates button.
Use the search bar and type vLLM.
Select vLLM Inference AI Template.

Step 4: Deploy the instance

Make sure SSH access is enabled.
Click Deploy.
The instance will start with vLLM already available.

Configuration Steps

Step 1: Pod Name

Your instance receives an auto-generated name to ensure uniqueness across the platform. This name helps you identify your instance in the dashboard, especially when you're managing multiple deployments.

Pod naming format: Typically follows a pattern like gpu-pytorch-a8f9g or similar unique identifiers.

Best practices:

Keep the auto-generated name for simplicity
Use descriptive tags or labels if managing multiple projects
Document your instance names in your project notes for reference

Step 2: GPU Template Selection

This is the most critical configuration step. Choose from over 20 pre-configured templates designed for AI/ML workloads.

Template selection process:

Click the Change Template button in the configuration deployment section
Browse templates by category (Framework, Type, Tool)
Click on a template to view its detailed specifications
Review included packages and versions
Select the template that matches your requirements

Template details display: When you hover over or select a template, you'll see:

Operating system version
CUDA toolkit version
Framework version (PyTorch, TensorFlow, etc.)
Pre-installed libraries
Development environment (Jupyter, VS Code, etc.)

Why template choice matters: The template determines your entire software stack. Choosing correctly means:

Your code runs without dependency errors
GPU drivers match your framework requirements
Development tools are immediately available
No time wasted on environment setup

Common template scenarios:

Scenario 1: Fine-tuning a Hugging Face model

Choose: "PyTorch 2.1 + Transformers + PEFT"
Why: Includes PyTorch, Transformers, parameter-efficient fine-tuning libraries

Scenario 2: Training a computer vision model

Choose: "TensorFlow 2.14 + OpenCV"
Why: Includes TensorFlow, image processing libraries, visualization tools

Scenario 3: Generating images with Stable Diffusion

Choose: "Stable Diffusion + ComfyUI"
Why: Includes diffusion models, ComfyUI interface, necessary dependencies

Scenario 4: LLM inference serving

Choose: "vLLM + FastAPI"
Why: Optimized for inference, includes serving framework, model quantization

Step 3: SSH Key Configuration

Add your public SSH key to enable secure remote access to your instance. This key-based authentication is more secure than password authentication and is the standard method for accessing cloud GPU instances.

What is SSH key authentication? SSH keys work in pairs:

Private key: Stays on your local machine (never share this)
Public key: Uploaded to NeevCloud (safe to share)

When you connect, NeevCloud verifies your private key matches the public key on file.

How to find your SSH public key:

On Linux/Mac:

cat ~/.ssh/id_rsa.pub

On Windows (PowerShell):

type $env:USERPROFILE\.ssh\id_rsa.pub

If you don't have an SSH key yet: Generate one with this command:

ssh-keygen -t rsa -b 4096 -C "[email protected]"

Follow the prompts (you can press Enter to accept defaults). This creates:

Private key: ~/.ssh/id_rsa
Public key: ~/.ssh/id_rsa.pub

Adding your SSH key to NeevCloud:

Copy your entire public key (starts with ssh-rsa and ends with your email)
Paste it into the "SSH Key" field in the configuration panel
Verify the key is complete (missing characters will prevent access)

Multiple SSH keys: You can add multiple public keys if:

You access from different machines
Multiple team members need access
You have separate keys for different security contexts

Step 4: Review and Deploy

Before deploying, review your complete configuration:

Instance specifications:

GPU type and VRAM
Region and availability zone
Pricing model (on-demand or reserved)

Template selection:

Framework and version
CUDA compatibility
Included tools and libraries

Access configuration:

SSH key properly added
Pod name noted for reference

Estimated costs:

Hourly rate displayed
Projected monthly cost (if running 24/7)

Click "Deploy GPU" when you're satisfied with your configuration.

Deployment Process

After clicking "Deploy GPU", NeevCloud:

Provisions hardware: Allocates physical GPU resources
Loads template: Deploys your selected software environment
Configures networking: Sets up SSH access and ports
Starts services: Launches Jupyter/ComfyUI if included in template
Reports ready: Updates dashboard with access information

Total deployment time: 10-20 seconds on average.

Accessing Your Deployed Instance

Once deployment completes, navigate to Dashboard → Active Instances to access your new GPU instance.

Information displayed for each instance:

Instance name and ID
GPU type and template used
Running time and current costs
Public IP address and SSH port
Access buttons (SSH, Jupyter, ComfyUI)
Management options (Stop, Delete, Snapshot)

Immediate next steps:

Test connectivity: Click "SSH" to verify you can connect
Access Jupyter: If included in template, click "Jupyter" to open notebook
Verify GPU: Run nvidia-smi to confirm GPU is available
Start working: Upload data, clone repositories, begin training

Pro Tips for Deployment Configuration

Tip 1: Match templates to your workflow exactly Don't pick a generic template if a specialized one exists. The "PyTorch + Transformers + PEFT" template saves hours compared to installing these libraries manually on a base PyTorch image.

Tip 2: Verify CUDA compatibility before deploying Check your existing code's CUDA requirements. If your code was developed with CUDA 11.8, choose a template with matching CUDA version to avoid compatibility issues.

Tip 3: Use reserved instances for development environments If you're working on a project for several weeks, reserve a smaller instance as your development environment, then use on-demand for larger training runs.

Tip 4: Keep SSH keys organized Name your SSH keys descriptively:

~/.ssh/neevcloud_projectname_rsa
~/.ssh/neevcloud_production_rsa

This helps when managing multiple instances across projects.

Tip 5: Deploy test instances first For critical workloads, deploy a small test instance with your chosen template first. Verify your code runs correctly before launching expensive, long-running training jobs.

Managing Your GPU Instances

Accessing Your Instance Management Dashboard

Once you've deployed your GPU instances, you need a centralized location to manage them. Navigate to Dashboard → GPU Instances to view and control all your active instances.

Dashboard overview: Your dashboard displays:

Active instances: Currently running GPU instances
Instance details: Specs, runtime, costs, access information
Quick actions: Delete, SSH etc

Available Management Actions

From your dashboard, you can perform several key operations:

1. Delete Instances

Remove instances you no longer need. This action:

Stops billing for that instance immediately
Releases the GPU resources back to the pool
Removes all data stored on the instance (unless you've created snapshots)
Frees up your account quota for new instances

When to delete:

Completed training jobs with results already downloaded
Temporary instances used for one-time tasks
Replaced instances after upgrading configuration
Test instances no longer needed

Important warning: Always ensure you've saved important data before deleting. Data on the instance is NOT recoverable after deletion unless you've:

Created a snapshot of the instance
Copied data to external storage (S3, your local machine)
Used persistent volumes

2. SSH into Existing Instances

Connect to your running instances for command-line access. This action:

Opens a secure shell connection to your GPU instance
Gives you full terminal access
Allows you to run commands, manage files, and monitor processes
Enables direct interaction with training jobs

We'll cover SSH in detail in the next section.

3. Access Web Interfaces

Many templates include web-based interfaces that you can access directly from your dashboard:

Jupyter Notebook/JupyterLab:

Click the "Jupyter" button on your instance
Opens in a new browser tab
Provides interactive development environment
Access files, notebooks, and terminal within the interface

ComfyUI (for Stable Diffusion templates):

Click the "ComfyUI" button on your instance
Opens the visual workflow interface
Build and run diffusion pipelines graphically
Generate images without writing code

VS Code Server (if included in template):

Full VS Code IDE in your browser
Edit code with familiar interface
Integrated terminal for running commands
Extensions and settings sync

4. Monitor Resource Usage

Track your instance's performance and utilization:

View GPU utilization percentage
Monitor VRAM usage
Check CPU and system memory
See network I/O and storage usage

This helps you:

Verify your code is using the GPU efficiently
Identify bottlenecks in your pipeline
Optimize resource allocation
Ensure you're not paying for underutilized resources

Automatic Environment Setup

When your GPU instance boots, NeevCloud automatically launches services based on your selected template. This automatic setup provides immediate benefits:

Jupyter Notebook auto-launch

For templates including Jupyter, the notebook server starts automatically on boot. This means:

No manual configuration needed
Immediate access via the dashboard "Jupyter" button
Pre-configured to work with the GPU
Common libraries already imported in the environment

You can start working immediately:

import torch
print(torch.cuda.is_available())  # Confirms GPU access
print(torch.cuda.get_device_name(0))  # Shows your GPU model

ComfyUI auto-launch

For generative AI templates, ComfyUI starts automatically, providing:

Visual workflow editor for diffusion models
Pre-loaded model nodes and components
Immediate image generation capability
No command-line setup required

Development environment ready

All templates include:

GPU drivers loaded and initialized
CUDA environment variables set correctly
Framework-specific optimizations enabled
Common directories created (/workspace, /data, /models)

Best Practices for Instance Management

Monitor your active instances regularly

Check your dashboard daily or weekly to:

Identify instances you forgot to stop or delete
Verify expected workloads are running
Catch unexpected cost increases early
Spot performance issues or errors

Use meaningful naming and tagging

While NeevCloud auto-generates instance names, you can add notes or tags:

Project name: "gpt-finetuning-prod"
Purpose: "training-run-experiment-5"
Team: "research-team-nlp"

This helps when managing multiple instances across different projects.

Implement shutdown automation

For training jobs with defined completion, use scripts to automatically stop instances:

# At the end of your training script
import subprocess
import sys

# Your training code here
model.train()
# ... training loop ...

# After training completes successfully
print("Training complete. Shutting down instance in 60 seconds...")
subprocess.run(["shutdown", "-h", "+1"])  # Shutdown in 1 minute

This prevents you from forgetting to stop instances and incurring unnecessary costs.

Create snapshots before major changes

Before upgrading packages, modifying configurations, or running experimental code:

Create a snapshot of your current instance
Make your changes
If something breaks, deploy a new instance from the snapshot
Delete the broken instance

This workflow prevents data loss and minimizes downtime.

Use persistent storage for critical data

Don't rely solely on instance storage for important data:

Use external storage (S3, cloud storage) for datasets
Regularly download model checkpoints
Version control your code in Git
Export results and logs periodically

Instance storage is ephemeral—it disappears when you delete the instance.

Optimize instance selection over time

As your project evolves:

Start with larger instances for initial experiments
Profile your actual resource usage
Downsize to more cost-effective instances for production
Use smaller instances for inference vs. training

Right-sizing saves significant costs without sacrificing performance.

Managing Multiple Instances

When running several instances simultaneously:

Organize by project or purpose:

Development instances: Smaller, on-demand for coding and testing
Training instances: Larger, reserved for long-running jobs
Inference instances: Optimized for serving, potentially reserved

Implement cost allocation:

Tag instances by project for accounting
Track costs per team or department
Set budget limits per project

Coordinate resource usage:

Schedule training jobs to avoid peak pricing periods
Share instances among team members when appropriate
Consolidate workloads on fewer instances when possible

PreviousPricing Models NextAI Templates

Last updated 21 days ago

Good evening

hashtagDeploying GPU

hashtagTL;DR

hashtagConfiguration Steps

hashtagStep 1: Pod Name

hashtagStep 2: GPU Template Selection

hashtagStep 3: SSH Key Configuration

hashtagStep 4: Review and Deploy

hashtagDeployment Process

hashtagAccessing Your Deployed Instance

hashtagPro Tips for Deployment Configuration

hashtagManaging Your GPU Instances

hashtagAccessing Your Instance Management Dashboard

hashtagAvailable Management Actions

hashtag1. Delete Instances

hashtag2. SSH into Existing Instances

hashtag3. Access Web Interfaces

hashtag4. Monitor Resource Usage

hashtagAutomatic Environment Setup

hashtagJupyter Notebook auto-launch

hashtagComfyUI auto-launch

hashtagDevelopment environment ready

hashtagBest Practices for Instance Management

hashtagMonitor your active instances regularly

hashtagUse meaningful naming and tagging

hashtagImplement shutdown automation

hashtagCreate snapshots before major changes

hashtagUse persistent storage for critical data

hashtagOptimize instance selection over time

hashtagManaging Multiple Instances