Heterogeneous Data Training
MACE supports loading multiple files for training, validation, and testing, with flexible support for different file formats within a single training run. This capability is particularly useful when working with data from multiple sources or when your dataset is split across several files.
Supported File Formats
MACE can handle a mix of the following file formats:
ASE files: Standard ASE-readable atomic structure files
HDF5 files: Pre-processed data in
.h5
or.hdf5
formatLMDB databases: Efficient storage and access for large datasets
Configuration
Basic Usage
You can provide multiple files as a list in your configuration:
heads:
DFT:
train_file: ["path/to/training1.xyz", "path/to/training2.xyz"]
valid_file: ["path/to/validation1.xyz", "path/to/validation2/"]
test_file: "path/to/test.xyz"
Mixing Different File Formats
MACE automatically detects file types and uses the appropriate loader:
heads:
QM7:
train_file: ["data/qm7/train.xyz", "data/qm7/train.h5", "data/qm7/database/"]
valid_file: ["data/qm7/valid.xyz", "data/qm7_h5/valid/"]
test_file: ["data/qm7/test.xyz", "data/qm7/test.h5"]