[Question] Scale Transferability Issue with Trained NequIP on Li Bulk Systems #526

UnHwanLee · 2025-07-17T07:02:29Z

UnHwanLee
Jul 17, 2025

Dear NequIP Team,

I hope this message finds you well.
First of all, thank you for your outstanding work on NequIP and Allegro. I’ve been using your framework to study Li metal systems and electrolyte in battery, and it has been incredibly useful.

I would like to ask for your opinion or suggestions regarding an issue I encountered with the scale-transferability of a trained NequIP model.

I generated 3×3×3 Li bulk structures using AIMD at 323K, 900K, and 1500K to ensure thermal diversity.
For computational efficiency and to reduce redundancy, I extracted 100 frames from each 10 ps AIMD trajectory (i.e., 100 × 3 = 300 frames total) and used them as my training dataset.
I trained a NequIP model on these 300 configurations. The model performs well when predicting energies and forces for 3×3×3 Li bulk structures, including unseen configurations.

However, when I apply this model to a 4×4×4 Li bulk structure (also generated via AIMD at 323K), the predicted energy/force accuracy significantly degrades.
I observed the same issue not only in Li bulk systems but also when attempting to generalize to larger slabs and electrolyte systems.

This is counterintuitive to me, because I expected that, as NequIP is based on local environments, the model should—at least to some extent—generalize to larger systems composed of similar local atomic motifs.
Moreover, a key reason for using MLIPs in general is to enable scaling up MD simulations beyond DFT-accessible system sizes, which is why I expected small-scale training to work for larger-scale predictions.

Do you have any insights or recommendations for this type of scale-transferability problem?
Is this a known limitation or failure mode in practice?
Would you suggest any specific training strategies, model architecture changes, or dataset augmentations to improve transferability across system sizes?
Could this be related to r_cut, neighbor statistics, or other NequIP-specific configurations (e.g., min_atoms, num_neighbors_statistics.yaml, etc.)?
I’ve already tried several variations in training hyperparameters and data preprocessing but haven’t yet resolved this issue. Any guidance would be greatly appreciated.

Thank you very much in advance for your time and help.

This is the config.yaml that I used to train model.

run: [train]

cutoff_radius: 6.0
chemical_symbols: [Li,S,F,O,N,C,H]
model_type_names: ${chemical_symbols}
monitored_metric: val0_epoch/weighted_sum

data:
  _target_: nequip.data.datamodule.ASEDataModule

  split_dataset:
    file_path: ./combined.extxyz
    train: 0.8
    val: 0.2
    
  seed: 123
  
  transforms:
    - _target_: nequip.data.transforms.NeighborListTransform
      r_max: ${cutoff_radius}
    - _target_: nequip.data.transforms.ChemicalSpeciesToAtomTypeMapper
      chemical_symbols: ${chemical_symbols}  
  
  train_dataloader:
    _target_: torch.utils.data.DataLoader
    batch_size: 8
    num_workers: 5
    
  val_dataloader:
    _target_: torch.utils.data.DataLoader
    batch_size: 8
    num_workers: 5
  test_dataloader: ${data.val_dataloader}
  stats_manager:
    _target_: nequip.data.CommonDataStatisticsManager
    type_names: ${model_type_names}

trainer:
  _target_: lightning.Trainer
  accelerator: gpu
  max_epochs: 100000
  check_val_every_n_epoch: 1
  #log_every_n_steps: 1
  
  callbacks:
    - _target_: lightning.pytorch.callbacks.EarlyStopping
      monitor: ${monitored_metric}            
      min_delta: 1e-5                         
      patience: 20                         

    - _target_: lightning.pytorch.callbacks.ModelCheckpoint
      monitor: ${monitored_metric}          
      dirpath: ${hydra:runtime.output_dir}  
      filename: best                          
      save_last: true                 

    - _target_: lightning.pytorch.callbacks.LearningRateMonitor
      logging_interval: epoch

  logger:
    - _target_: lightning.pytorch.loggers.CSVLogger
      save_dir: ${hydra:runtime.output_dir}
      name: csv_log


training_module:
  _target_: nequip.train.EMALightningModule
  ema_decay: 0.999 
  
  loss:
    _target_: nequip.train.EnergyForceLoss
    per_atom_energy: true
    coeffs:
      total_energy: 1.0
      forces: 5.0

      
  val_metrics:
    _target_: nequip.train.EnergyForceMetrics
    coeffs:
      per_atom_energy_mae: 1.0
      forces_mae: 1.0

      
  train_metrics: ${training_module.val_metrics}
  
  optimizer:
    _target_: torch.optim.Adam
    lr: 0.005
    
    
  lr_scheduler:
    scheduler:
      _target_: torch.optim.lr_scheduler.ReduceLROnPlateau
      factor: 0.6
      patience: 5
      threshold: 0.2
      min_lr: 1e-6
    monitor: ${monitored_metric}
    interval: epoch
    frequency: 1
    
    
  model:
    _target_: nequip.model.NequIPGNNModel


    seed: 456
    model_dtype: float64
    type_names: ${model_type_names}
    r_max: ${cutoff_radius}


    num_bessels: 8
    bessel_trainable: false
    polynomial_cutoff_p: 6


    l_max: 1
    num_layers: 3
    parity: true
    num_features: 32

    radial_mlp_depth: 2
    radial_mlp_width: 64


    avg_num_neighbors: ${training_data_stats:num_neighbors_mean}
    per_type_energy_shifts: ${training_data_stats:per_atom_energy_mean}
    per_type_energy_scales: ${training_data_stats:forces_rms}
    per_type_energy_scales_trainable: false
    per_type_energy_shifts_trainable: false


global_options:
  allow_tf32: false

It is the energy and force error graph in 3x3x3 test data. (It was trained in my model)

It is the energy and force error graph in 4x4x4 test data. (It was not trained in my model)

cw-tan · 2025-07-17T21:55:53Z

cw-tan
Jul 17, 2025
Maintainer

Hi @UnHwanLee

The first thing I'd suggest doing is to use the isolated atom energies for your system computed with the same reference method used for doing your AIMD. For more details, see https://nequip.readthedocs.io/en/latest/guide/configuration/model.html#energy-shifts-scales. This suggestion is based on the consistent (though small) energy offset for the 3 by 3 by 3 and large, yet consistent looking offset for the 4 by 4 by 4 cell. Let's see if this change works first.

If some issues persist, you might consider augmenting your dataset with 1 by 1 by 1, 2 by 2 by 2, etc supercell training data, which might be helpful to show more variety and avoid being overfit on a specific supercell size. NequIP is a message-passing GNN, and your config shows num_layers: 3 and cutoff radius of 6 A, so it has a 18 A effective field of vision at face value. Perhaps having more structural diversity within the effective field of vision might be helpful.

It might also be worth trying l_max values of 1, 2, 3 just in case that has an effect.

Side note that you can experiment with parity: false to cut down on cost, and use our PT2 compilation accelerations (https://nequip.readthedocs.io/en/latest/guide/accelerations/pt2_compilation.html) and OpenEquivariance acceleration (see https://nequip.readthedocs.io/en/latest/guide/accelerations/openequivariance.html) to speed up training, so you can iterate on experiments faster, if you haven't already been using these.

0 replies

apoletayev · 2025-08-30T17:12:06Z

apoletayev
Aug 30, 2025

(I am a user, not a NEquIP developer)
To add to what @cw-tan has written, I would also check the consistency of the DFT / AIMD itself. If the AIMD is done with something standard like VASP, then the k-point meshing may be substantively different between 3x3x3 and 4x4x4, which would yield differing per-atom energies. I see this as a plausible explanation for why your 3x3x3 DFT energies range from 2.06 to 2.0 eV/atom while those from 4x4x4 are around 1.92 eV/atom. In other words, I am not sure this is a NEquIP problem in the first place.

Even on the 3x3x3-trained data, your energy errors are dominated by an offset. That could be taken away by setting per_type_energy_shifts_trainable: true, but it will not solve the larger additional offset between the 3x3x3 and 4x4x4 AIMD datasets.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Scale Transferability Issue with Trained NequIP on Li Bulk Systems #526

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

[Question] Scale Transferability Issue with Trained NequIP on Li Bulk Systems #526

Uh oh!

Uh oh!

UnHwanLee Jul 17, 2025

Replies: 2 comments

Uh oh!

cw-tan Jul 17, 2025 Maintainer

Uh oh!

Uh oh!

apoletayev Aug 30, 2025

UnHwanLee
Jul 17, 2025

cw-tan
Jul 17, 2025
Maintainer

apoletayev
Aug 30, 2025