Skip to content Skip to sidebar Skip to footer

What Is The Effect Of Tf.nn.conv2d() On An Input Tensor Shape?

I am studying tensorboard code from Dandelion Mane specificially: https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial/blob/master/mnist.py His convolution layer is

Solution 1:

This works in your case because of the stride used and the padding applied. The output width and height will not always be the same as the input.

Check out this excellent discussion of the topic. The basic takeaway (taken almost verbatim from that link) is that a convolution layer:

  • Accepts an input volume of size W1 x H1 x D1
  • Requires four hyperparameters:
    • Number of filters K
    • Spatial extent of filters F
    • The stride with which the filter moves S
    • The amount of zero padding P
  • Produces a volume of size W2 x H2 x D2 where:
    • W2 = (W1 - F + 2*P)/S + 1
    • H2 = (H1 - F + 2*P)/S + 1
    • D2 = K

And when you are processing batches of data in Tensorflow they typically have shape [batch_size, width, height, depth], so the first dimension which is just the number of samples in your batch should not change.

Note that the amount of padding P in the above is a little tricky with TF. When you give the padding='same' argument to tf.nn.conv2d, tensorflow applies zero padding to both sides of the image to make sure that no pixels of the image are ignored by your filter, but it may not add the same amount of padding to both sides (can differ by only one I think). This SO thread has some good discussion on the topic.

In general, with a stride S of 1 (which your network has), zero padding of P = (F - 1) / 2 will ensure that the output width/height equals the input, i.e. W2 = W1 and H2 = H1. In your case, F is 5, so tf.nn.conv2d must be adding two zeros to each side of the image for a P of 2, and your output width according to the above equation is W2 = (W1 - 5 + 2*2)/1 + 1 = W1 - 1 + 1 = W1.

Post a Comment for "What Is The Effect Of Tf.nn.conv2d() On An Input Tensor Shape?"