GNN Loss NaN after first training example?

Before training, log your input features:

print(tf.math.reduce_any(tf.math.is_nan(node_features))) # or PyTorch equivalent

If inputs have very large values, normalize them:

x = tf.clip_by_value(x, -10.0, 10.0)

Too high a learning rate = unstable training.

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)

Prevent exploding gradients:

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3, clipnorm=1.0)

eps = 1e-7

loss = -tf.math.log(tf.clip_by_value(probabilities, eps, 1.0))

How is the backpropagation for Separable Convolution 2D in tensorflow/keras implemented exactly?