Deploying a GPU
Deploying GPU
TL;DR
Step 1: Start GPU deployment
Log in to NeevCloud.
Click Deploy your first GPU.
Step 2: Choose GPU configuration
Select a GPU suitable for running vLLM inference. Recommended options:
A10: For small models.
A100 40GB or 80GB: For large models.
H100: For maximum throughput.
Step 3: Search and select the template
Click the Explore Templates button.
Use the search bar and type
vLLM.Select vLLM Inference AI Template.
Step 4: Deploy the instance
Make sure SSH access is enabled.
Click Deploy.
The instance will start with vLLM already available.
Configuration Steps
Step 1: Pod Name
Your instance receives an auto-generated name to ensure uniqueness across the platform. This name helps you identify your instance in the dashboard, especially when you're managing multiple deployments.
Pod naming format: Typically follows a pattern like
gpu-pytorch-a8f9gor similar unique identifiers.
Best practices:
Keep the auto-generated name for simplicity
Use descriptive tags or labels if managing multiple projects
Document your instance names in your project notes for reference
Step 2: GPU Template Selection
This is the most critical configuration step. Choose from over 20 pre-configured templates designed for AI/ML workloads.
Template selection process:
Click the Change Template button in the configuration deployment section
Browse templates by category (Framework, Type, Tool)
Click on a template to view its detailed specifications
Review included packages and versions
Select the template that matches your requirements
Template details display: When you hover over or select a template, you'll see:
Operating system version
CUDA toolkit version
Framework version (PyTorch, TensorFlow, etc.)
Pre-installed libraries
Development environment (Jupyter, VS Code, etc.)
Why template choice matters: The template determines your entire software stack. Choosing correctly means:
Your code runs without dependency errors
GPU drivers match your framework requirements
Development tools are immediately available
No time wasted on environment setup
Common template scenarios:
Scenario 1: Fine-tuning a Hugging Face model
Choose: "PyTorch 2.1 + Transformers + PEFT"
Why: Includes PyTorch, Transformers, parameter-efficient fine-tuning libraries
Scenario 2: Training a computer vision model
Choose: "TensorFlow 2.14 + OpenCV"
Why: Includes TensorFlow, image processing libraries, visualization tools
Scenario 3: Generating images with Stable Diffusion
Choose: "Stable Diffusion + ComfyUI"
Why: Includes diffusion models, ComfyUI interface, necessary dependencies
Scenario 4: LLM inference serving
Choose: "vLLM + FastAPI"
Why: Optimized for inference, includes serving framework, model quantization
Step 3: SSH Key Configuration
Add your public SSH key to enable secure remote access to your instance. This key-based authentication is more secure than password authentication and is the standard method for accessing cloud GPU instances.
What is SSH key authentication? SSH keys work in pairs:
Private key: Stays on your local machine (never share this)
Public key: Uploaded to NeevCloud (safe to share)
When you connect, NeevCloud verifies your private key matches the public key on file.
How to find your SSH public key:
On Linux/Mac:
On Windows (PowerShell):
If you don't have an SSH key yet: Generate one with this command:
Follow the prompts (you can press Enter to accept defaults). This creates:
Private key:
~/.ssh/id_rsaPublic key:
~/.ssh/id_rsa.pub
Adding your SSH key to NeevCloud:
Copy your entire public key (starts with
ssh-rsaand ends with your email)Paste it into the "SSH Key" field in the configuration panel
Verify the key is complete (missing characters will prevent access)
Multiple SSH keys: You can add multiple public keys if:
You access from different machines
Multiple team members need access
You have separate keys for different security contexts
Step 4: Review and Deploy
Before deploying, review your complete configuration:
Instance specifications:
GPU type and VRAM
Region and availability zone
Pricing model (on-demand or reserved)
Template selection:
Framework and version
CUDA compatibility
Included tools and libraries
Access configuration:
SSH key properly added
Pod name noted for reference
Estimated costs:
Hourly rate displayed
Projected monthly cost (if running 24/7)
Click "Deploy GPU" when you're satisfied with your configuration.
Deployment Process
After clicking "Deploy GPU", NeevCloud:
Provisions hardware: Allocates physical GPU resources
Loads template: Deploys your selected software environment
Configures networking: Sets up SSH access and ports
Starts services: Launches Jupyter/ComfyUI if included in template
Reports ready: Updates dashboard with access information
Total deployment time: 10-20 seconds on average.
Accessing Your Deployed Instance
Once deployment completes, navigate to Dashboard → Active Instances to access your new GPU instance.
Information displayed for each instance:
Instance name and ID
GPU type and template used
Running time and current costs
Public IP address and SSH port
Access buttons (SSH, Jupyter, ComfyUI)
Management options (Stop, Delete, Snapshot)
Immediate next steps:
Test connectivity: Click "SSH" to verify you can connect
Access Jupyter: If included in template, click "Jupyter" to open notebook
Verify GPU: Run
nvidia-smito confirm GPU is availableStart working: Upload data, clone repositories, begin training
Pro Tips for Deployment Configuration
Tip 1: Match templates to your workflow exactly Don't pick a generic template if a specialized one exists. The "PyTorch + Transformers + PEFT" template saves hours compared to installing these libraries manually on a base PyTorch image.
Tip 2: Verify CUDA compatibility before deploying Check your existing code's CUDA requirements. If your code was developed with CUDA 11.8, choose a template with matching CUDA version to avoid compatibility issues.
Tip 3: Use reserved instances for development environments If you're working on a project for several weeks, reserve a smaller instance as your development environment, then use on-demand for larger training runs.
Tip 4: Keep SSH keys organized Name your SSH keys descriptively:
This helps when managing multiple instances across projects.
Tip 5: Deploy test instances first For critical workloads, deploy a small test instance with your chosen template first. Verify your code runs correctly before launching expensive, long-running training jobs.
Managing Your GPU Instances
Accessing Your Instance Management Dashboard
Once you've deployed your GPU instances, you need a centralized location to manage them. Navigate to Dashboard → GPU Instances to view and control all your active instances.
Dashboard overview: Your dashboard displays:
Active instances: Currently running GPU instances
Instance details: Specs, runtime, costs, access information
Quick actions: Delete, SSH etc
Available Management Actions
From your dashboard, you can perform several key operations:
1. Delete Instances
Remove instances you no longer need. This action:
Stops billing for that instance immediately
Releases the GPU resources back to the pool
Removes all data stored on the instance (unless you've created snapshots)
Frees up your account quota for new instances
When to delete:
Completed training jobs with results already downloaded
Temporary instances used for one-time tasks
Replaced instances after upgrading configuration
Test instances no longer needed
Important warning: Always ensure you've saved important data before deleting. Data on the instance is NOT recoverable after deletion unless you've:
Created a snapshot of the instance
Copied data to external storage (S3, your local machine)
Used persistent volumes
2. SSH into Existing Instances
Connect to your running instances for command-line access. This action:
Opens a secure shell connection to your GPU instance
Gives you full terminal access
Allows you to run commands, manage files, and monitor processes
Enables direct interaction with training jobs
We'll cover SSH in detail in the next section.
3. Access Web Interfaces
Many templates include web-based interfaces that you can access directly from your dashboard:
Jupyter Notebook/JupyterLab:
Click the "Jupyter" button on your instance
Opens in a new browser tab
Provides interactive development environment
Access files, notebooks, and terminal within the interface
ComfyUI (for Stable Diffusion templates):
Click the "ComfyUI" button on your instance
Opens the visual workflow interface
Build and run diffusion pipelines graphically
Generate images without writing code
VS Code Server (if included in template):
Full VS Code IDE in your browser
Edit code with familiar interface
Integrated terminal for running commands
Extensions and settings sync
4. Monitor Resource Usage
Track your instance's performance and utilization:
View GPU utilization percentage
Monitor VRAM usage
Check CPU and system memory
See network I/O and storage usage
This helps you:
Verify your code is using the GPU efficiently
Identify bottlenecks in your pipeline
Optimize resource allocation
Ensure you're not paying for underutilized resources
Automatic Environment Setup
When your GPU instance boots, NeevCloud automatically launches services based on your selected template. This automatic setup provides immediate benefits:
Jupyter Notebook auto-launch
For templates including Jupyter, the notebook server starts automatically on boot. This means:
No manual configuration needed
Immediate access via the dashboard "Jupyter" button
Pre-configured to work with the GPU
Common libraries already imported in the environment
You can start working immediately:
ComfyUI auto-launch
For generative AI templates, ComfyUI starts automatically, providing:
Visual workflow editor for diffusion models
Pre-loaded model nodes and components
Immediate image generation capability
No command-line setup required
Development environment ready
All templates include:
GPU drivers loaded and initialized
CUDA environment variables set correctly
Framework-specific optimizations enabled
Common directories created (
/workspace,/data,/models)
Best Practices for Instance Management
Monitor your active instances regularly
Check your dashboard daily or weekly to:
Identify instances you forgot to stop or delete
Verify expected workloads are running
Catch unexpected cost increases early
Spot performance issues or errors
Use meaningful naming and tagging
While NeevCloud auto-generates instance names, you can add notes or tags:
Project name: "gpt-finetuning-prod"
Purpose: "training-run-experiment-5"
Team: "research-team-nlp"
This helps when managing multiple instances across different projects.
Implement shutdown automation
For training jobs with defined completion, use scripts to automatically stop instances:
This prevents you from forgetting to stop instances and incurring unnecessary costs.
Create snapshots before major changes
Before upgrading packages, modifying configurations, or running experimental code:
Create a snapshot of your current instance
Make your changes
If something breaks, deploy a new instance from the snapshot
Delete the broken instance
This workflow prevents data loss and minimizes downtime.
Use persistent storage for critical data
Don't rely solely on instance storage for important data:
Use external storage (S3, cloud storage) for datasets
Regularly download model checkpoints
Version control your code in Git
Export results and logs periodically
Instance storage is ephemeral—it disappears when you delete the instance.
Optimize instance selection over time
As your project evolves:
Start with larger instances for initial experiments
Profile your actual resource usage
Downsize to more cost-effective instances for production
Use smaller instances for inference vs. training
Right-sizing saves significant costs without sacrificing performance.
Managing Multiple Instances
When running several instances simultaneously:
Organize by project or purpose:
Development instances: Smaller, on-demand for coding and testing
Training instances: Larger, reserved for long-running jobs
Inference instances: Optimized for serving, potentially reserved
Implement cost allocation:
Tag instances by project for accounting
Track costs per team or department
Set budget limits per project
Coordinate resource usage:
Schedule training jobs to avoid peak pricing periods
Share instances among team members when appropriate
Consolidate workloads on fewer instances when possible
Last updated