Model Fine-tuning

Model Fine-tuning

Fustrated with RAG, Prompt engineering, hallucination? Fine-tuning allows you to maximize the potential of a pretrained Language Model (LLM) by adjusting the model weights to better suit a specific task.

With SimpliML you are ready to finetune LLM's with all the recommended, state-of-the-art optimizations for swift training results:

  • Parameter-Efficient Fine-Tuning (PEFT) through LoRA adapters for faster convergence.
  • 4-bit Quantization fine-tuning with Qlora, providing you with enhanced flexibility, faster training and les memory requirenment.
  • Flash Attention for quick and memory-efficient attention during training (please note: works with certain hardware, like A100s).
  • Gradient checkpointing to reduce VRAM footprint, accommodate larger batches, and achieve higher training throughput.
  • Distributed training via DeepSpeed, ensuring optimal scaling with multiple GPUs.

Moreover, leveraging SimpliML for training eliminates infrastructure concerns like building images, provisioning GPUs, and managing cloud storage. If a training pipeline runs on SimpliML, it's not only repeatable but also scalable enough to be seamlessly transitioned to production.


SimpliML supports popular dataset formats like Alpaca, ShareGPT, OpenAI, etc. Additionally, users can bring their own data in JSONL or CSV format. Enjoy the flexibility to work with a variety of dataset formats that suit your specific needs.

Model supported for fine-tuning

  • LLaMA (7B/13B/70B)
  • LLaMA 2 (7B/13B/70B)
  • Mistral (7B)
  • Mixtral (8x7B)
  • Falcon (7B/40B)
  • Phi-1.5/2 (1.3B/2.7B)
  • Yi (6B/34B)