About

Lamini On-Demand provides a self-service, pay-as-you-go platform for running LLM tuning and inference jobs on a high-performance GPU cluster. New and existing users receive $300 in free credit to get started. Inference is billed at $0.50 per million tokens, covering input, output, and JSON structured responses. Tuning costs $1 per step on one GPU, scaling linearly with additional GPUs for faster performance. Users benefit from advanced features like memory tuning for mixture-of-experts models, guaranteed JSON output, and flexible burst across GPUs. Credits can be purchased in $100 increments via the account dashboard. For enterprise-scale needs, Lamini also offers Reserved GPU clusters and Self-Managed licenses for on-premise or air-gapped deployments.

Tiers

On-Demand Pay-as-you-go

up to 300

Access GPU tuning and inference with $300 free credit and simple usage-based pricing.

$300 free credit upon signup
$0.50 per million inference tokens
$1 per tuning step per GPU (linear scaling)
Memory Tuning for mixture-of-experts models
Burst tuning across multiple GPUs

Eligibility

New and existing Lamini users
No long-term commitments required

Effort: Very low 2 step s to apply

Create a Lamini account. Sign up on the Lamini platform to automatically receive $300 in free credit. Sign Up ↗
Run your first job. Start a tuning or inference job in the On-Demand dashboard to utilize your free credit. Use Now ↗

FAQ

How do I redeem the $300 free credit?

The $300 free credit is automatically applied to your Lamini account after signing up or logging in to the On-Demand platform.

Does the free credit expire?

There is no specified expiration date; credits remain valid until they are fully used.

What are the rates after the free credit?

Inference costs $0.50 per million tokens, and tuning costs $1 per step on one GPU, scaling linearly with additional GPUs.

Can I purchase more credit?

Yes, you can buy additional credits in $100 increments from your account dashboard.

Are there other deployment options?

For larger or dedicated workloads, Lamini offers Reserved GPU clusters and Self-Managed licenses for on-premise or air-gapped environments.