How is the backpropagation for Separable Convolution 2D in tensorflow/keras implemented exactly?

How Backpropagation Works in SeparableConv2D

Forward Pass

Let:

Input shape = [B, H, W, C_in]

Depthwise kernel = [K, K, C_in, 1]

Pointwise kernel = [1, 1, C_in, C_out]

The output is computed as:

First: apply depthwise conv → [B, H, W, C_in]

Then: apply pointwise conv → [B, H, W, C_out]

Backward Pass

Backpropagation happens in two parts, just like forward pass:

A. Gradient wrt Pointwise Convolution

Acts like a standard 1×1 convolution.

Gradients w.r.t pointwise kernel are computed using standard conv backprop.

Gradient flows from loss → output → pointwise conv → depthwise output.

B. Gradient wrt Depthwise Convolution

Since each filter is applied to one input channel only, gradients are calculated per-channel.

The chain rule applies:

From pointwise gradients → through the depthwise output → back to each channel’s depthwise filter.

TensorFlow handles both parts using standard Conv2DBackprop* operations under the hood.

©2026 Samyak Infotech Pvt Ltd. | All trademarks, images and logos are the property of their respective owners.