Lamini On-Demand
Pay-as-you-go GPU cloud for LLM tuning and inference
Redeem on Lamini ↗About
Lamini On-Demand provides a self-service, pay-as-you-go platform for running LLM tuning and inference jobs on a high-performance GPU cluster. New and existing users receive $300 in free credit to get started. Inference is billed at $0.50 per million tokens, covering input, output, and JSON structured responses. Tuning costs $1 per step on one GPU, scaling linearly with additional GPUs for faster performance. Users benefit from advanced features like memory tuning for mixture-of-experts models, guaranteed JSON output, and flexible burst across GPUs. Credits can be purchased in $100 increments via the account dashboard. For enterprise-scale needs, Lamini also offers Reserved GPU clusters and Self-Managed licenses for on-premise or air-gapped deployments.
Tiers
On-Demand Pay-as-you-go
up to 300Access GPU tuning and inference with $300 free credit and simple usage-based pricing.
- $300 free credit upon signup
- $0.50 per million inference tokens
- $1 per tuning step per GPU (linear scaling)
- Memory Tuning for mixture-of-experts models
- Burst tuning across multiple GPUs
- New and existing Lamini users
- No long-term commitments required
FAQ
How do I redeem the $300 free credit?
The $300 free credit is automatically applied to your Lamini account after signing up or logging in to the On-Demand platform.
Does the free credit expire?
There is no specified expiration date; credits remain valid until they are fully used.
What are the rates after the free credit?
Inference costs $0.50 per million tokens, and tuning costs $1 per step on one GPU, scaling linearly with additional GPUs.
Can I purchase more credit?
Yes, you can buy additional credits in $100 increments from your account dashboard.
Are there other deployment options?
For larger or dedicated workloads, Lamini offers Reserved GPU clusters and Self-Managed licenses for on-premise or air-gapped environments.