← AI Glossary

What Is LoRA (Low-Rank Adaptation)?

LoRA (Low-Rank Adaptation) is a technique for fine-tuning large language models efficiently by training only a tiny fraction of the model’s parameters.

The Problem LoRA Solves

Full fine-tuning of a model like Llama 3 70B requires updating billions of parameters — demanding hundreds of GBs of GPU memory and significant compute time. Most teams can’t afford this.

How LoRA Works

Instead of updating all model weights, LoRA:

  1. Freezes the original model parameters
  2. Adds small trainable “adapter” matrices to specific layers
  3. Trains only these adapters (typically <1% of total parameters)
  4. At inference, the adapters merge with the original weights

The result: same quality as full fine-tuning at a fraction of the cost.

LoRA vs. Full Fine-Tuning

AspectFull Fine-TuningLoRA
Parameters trainedAll (~70B)~0.1% (~70M)
GPU memory100+ GB16-24 GB
Training timeDaysHours
QualityBaselineComparable
Storage per adapterFull model copy~100 MB

QLoRA

QLoRA combines LoRA with quantization — loading the base model in 4-bit precision while training LoRA adapters in full precision. This makes fine-tuning a 70B model possible on a single consumer GPU.

LoRA in Practice

Many community models on Hugging Face are LoRA adapters shared on top of base models. You can stack multiple LoRA adapters to combine different capabilities — one for coding, another for creative writing.

Elvean brings all these concepts together in one native Mac app — local models, cloud APIs, agentic tools, and more.

Learn more about Elvean