How is the backpropagation for Separable Convolution 2D in tensorflow/keras implemented exactly?

How Backpropagation Works in SeparableConv2D

Forward Pass

Let:

  • Input shape = [B, H, W, C_in]
  • Depthwise kernel = [K, K, C_in, 1]
  • Pointwise kernel = [1, 1, C_in, C_out]

The output is computed as:

  • First: apply depthwise conv → [B, H, W, C_in]
  • Then: apply pointwise conv → [B, H, W, C_out]
  •  

Backward Pass

 

Backpropagation happens in two parts, just like forward pass:

 

A. Gradient wrt Pointwise Convolution

 

  • Acts like a standard 1×1 convolution.
  • Gradients w.r.t pointwise kernel are computed using standard conv backprop.
  • Gradient flows from loss → output → pointwise conv → depthwise output.
  •  

B. Gradient wrt Depthwise Convolution

 

  • Since each filter is applied to one input channel only, gradients are calculated per-channel.
  • The chain rule applies:
  • From pointwise gradients → through the depthwise output → back to each channel’s depthwise filter.

TensorFlow handles both parts using standard Conv2DBackprop* operations under the hood.