GNN Loss NaN after first training example?

Check Inputs for NaNs or Infs

 

Before training, log your input features:

print(tf.math.reduce_any(tf.math.is_nan(node_features)))  # or PyTorch equivalent

 

Clamp or Normalize Input Features

 

If inputs have very large values, normalize them:

x = tf.clip_by_value(x, -10.0, 10.0)

 

Lower the Learning Rate

 

Too high a learning rate = unstable training.

  • Try reducing by 10x:

 

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)

 

Gradient Clipping

 

Prevent exploding gradients:

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3, clipnorm=1.0)

 

Check Your Loss Function

 

  • Are you using log_softmax + nll_loss together correctly?

  • Avoid taking log(0) — add small epsilon if needed:

eps = 1e-7

loss = -tf.math.log(tf.clip_by_value(probabilities, eps, 1.0))

 

Check BatchNorm / LayerNorm (if used)

 

  • They can cause instability with small batch sizes.

  • Try removing or replacing with stable alternatives.

Weight Initialization

 

  • For custom GNN layers, use GlorotUniform or another stable initializer.

  • Avoid initializing with very large standard deviations.

How is the backpropagation for Separable Convolution 2D in tensorflow/keras implemented exactly?