ThinnestAI × Modal: Awarded Modal for Startups Credits for GPU Compute

Serverless GPUs for AI Innovation
Today, we're excited to share that Thinnest AI has received Modal for Startups credits — giving us access to serverless GPU compute for our most demanding AI workloads.
Modal has quietly become the go-to platform for AI teams that need GPU compute without the headache of managing Kubernetes clusters, Docker images, or cloud GPU quotas. With Modal, we can run a fine-tuning job on 4x A100 GPUs with a single Python decorator — and pay only for the seconds we actually use.
Why Modal?
GPU compute is the lifeblood of AI development, but traditional cloud GPUs come with painful trade-offs: long provisioning times, complex orchestration, and minimum commitments that don't make sense for a startup. Modal eliminates all of that.
1. Zero Infrastructure Management
With Modal, there are no Dockerfiles to maintain, no Kubernetes configs to debug, no GPU drivers to install. You write a Python function, add a @modal.function(gpu="A100") decorator, and it runs on cloud GPUs instantly. From code to GPU in seconds, not hours.
2. Pay-Per-Second Pricing
Traditional cloud GPUs charge by the hour — even if your job finishes in 12 minutes. Modal charges by the second with no minimum commitment. For a startup running intermittent fine-tuning jobs, this can reduce GPU costs by 70-80% compared to reserved instances.
3. Instant Scaling
Need to process 10,000 knowledge base documents for vector embeddings? Modal can spin up 100 containers in parallel, process everything in minutes, and scale back to zero when done. Batch jobs that used to take hours now finish in minutes.
What We're Running on Modal
Modal powers several critical AI workloads in our pipeline:
- TTS Model Fine-Tuning: Fine-tuning text-to-speech models for Hindi and Indian languages using LoRA adapters on high-memory GPUs
- Batch Embedding Generation: When customers upload large knowledge bases (PDFs, documents, websites), we generate vector embeddings in parallel on Modal for fast indexing
- Model Evaluation: Running evaluation benchmarks across multiple model variants to find the best configuration for each language and use case
- Experimental Inference: Testing new model architectures and configurations before deploying to production
What This Means for Our Users
Modal credits translate directly into better AI agents for our users:
- Better voice quality: More fine-tuning iterations mean more natural-sounding Hindi and Indian language TTS
- Faster knowledge indexing: Large document uploads are processed in minutes instead of hours
- More model options: We can test and validate new models faster, bringing the best options to our agent builder sooner
- Lower costs: Efficient GPU usage means we can keep our prices low while improving quality
Try It Yourself
The AI agents powered by our Modal-accelerated models are live. Sign up and experience the difference:
25 free voice minutes • 200 chat messages • No credit card required
Thank You, Modal
We're grateful to the Modal team for supporting our mission. Serverless GPUs are a game-changer for AI startups, and we're excited to push the boundaries of what's possible with their platform.
— The Thinnest AI Team