Fine-tuning Foundation Models

Warning

Fine-tuning is still experimental and under active development. The API and methods are subject to change.

Fine-tuning is the process of refining a pre-trained model on a new dataset. This is useful when you want better quantitative performance on a specific task than the available pre-trained models. Fine-tuning usually leads to significant improvements in performance compared to training a model from scratch. We have three fine-tuning protocols:

The naive fine-tuning protocol, where the model is trained on the new dataset just by restarting from the foundation model weights.

The multihead replay fine-tuning protocol, where the model is trained on the new dataset while replaying a part of the original foundational model training data.

The LoRA (Low-Rank Adaptation) fine-tuning protocol, where only small low-rank adapters are trained while the base model weights are frozen. See the LoRA fine-tuning guide.

The multihead replay finetuning prevent catastrophic forgetting that occurs sometimes during the naive fine-tuning. It usually leads to a more robust and stable model. It is recommended to use this protocol to fine-tune any materials project foundation model.

To finetune one of the mace-mp-0 foundation model, you can use the mace_run_train script with the extra argument --foundation_model=model_type.

Naive Fine-tuning

The naive fine-tuning protocol is the simplest way to fine-tune a model. For example to finetune the small model on a new dataset, you can use:

mace_run_train \
    --name="MACE" \
    --foundation_model="small" \
    --multiheads_finetuning=False \
    --train_file="train.xyz" \
    --valid_fraction=0.05 \
    --test_file="test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=1.0 \
    --E0s="average" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=2 \
    --max_num_epochs=6 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=3

Other options are “medium” and “large”, or the path to a foundation model. If you want to finetune another model, the model will be loaded from the path provided --foundation_model=$path_model. The hyperparameters will be automatically extracted from the model.

Multihead Replay Fine-tuning

The multihead replay fine-tuning protocol prevents catastrophic forgetting that occurs sometimes during the naive fine-tuning. It usually leads to a more robust and stable model. It is the recommended way to fine-tune any materials project foundation model.

For more information on the multihead replay fine-tuning protocol, please refer to the multihead fine-tuning guide.

LoRA Fine-tuning

LoRA (Low-Rank Adaptation) fine-tuning freezes all base model weights and only trains small low-rank adapter matrices injected into each layer. This is particularly useful when you have a small dataset and want to avoid overfitting, or when you want to reduce training memory and compute requirements.

LoRA can be combined with both the naive and multihead replay protocols.

For more information on LoRA fine-tuning, please refer to the LoRA fine-tuning guide.