AI techniques distillation vs compression vs quantisation vs fine tuning | OlaBola

OlaBola

What I think is true of these 4, somewhat related things.

Model compression

Make models smaller or faster
Why do we care about smaller models?
- Less space
- Less computation for inference
- can deploy these on the edge
- Latency
Maybe we don’t care about all the juice the larger model has

Model quantisation

A method of model compression
Use less bits in the model parameters
32 -> 16 or something

Model distillation

A method of model compression
Transfer knowledge from a larger “Teacher” model to a smaller “student” one
Distillation can also be used beyond compression. For example, to transfer capabilities between model architectures.

Model fine tuning

Adapt a pre trained model to do even better on your task
Can avoid fine tuning by just prompt engineering