understanding Dense layer in Keras

This notebook describes dense layer or fully connected layer using tensorflow.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
def reset_seed(seed=313):
    tf.keras.backend.clear_session()
    tf.random.set_seed(seed)
    np.random.seed(seed)

np.set_printoptions(linewidth=100, suppress=True)

print(tf.__version__)
2.7.0
print(np.__version__)
1.21.6

set some global parameters

input_features = 2
batch_size = 10
dense_units = 5

define input to model

in_np = np.random.randint(0, 100, size=(batch_size,input_features))
print(in_np)
[[73 84]
 [34 76]
 [83 33]
 [95  2]
 [15 20]
 [ 6 87]
 [25 10]
 [54 65]
 [ 3 26]
 [27 38]]

build a model consisting of single dense layer

reset_seed()


ins = Input(input_features, name='my_input')
out = Dense(dense_units, use_bias=False, name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)

print(out_np)
[[-24.545158    60.223736   -22.586353    10.208422     5.8342476 ]
 [-23.865839    29.79804    -19.818281    31.742483    -6.619419  ]
 [ -6.831863    65.5096      -9.919023   -34.138214    22.459442  ]
 [  4.241456    73.28466     -2.3332765  -65.252594    34.763397  ]
 [ -5.9672885   12.504654    -5.3318644    4.102663     0.50515246]
 [-29.023615     8.747902   -22.052912    59.456474   -19.799788  ]
 [ -2.0781016   19.734663    -3.0028474  -10.238509     6.7496395 ]
 [-19.122025    44.684822   -17.429634     9.646704     3.5908635 ]
 [ -8.611273     3.544132    -6.613761    16.921026    -5.469105  ]
 [-11.415466    22.603212   -10.101664     8.848475     0.40289795]]
print(out_np.shape)
(10, 5)

We can get all layers of model as list

print(model.layers)
[<keras.engine.input_layer.InputLayer object at 0x70633c3aa0a0>, <keras.layers.core.dense.Dense object at 0x7062ea5161f0>]

or a specific layer by its name

dense_layer = model.get_layer('my_output')

input to dense layer must be of the shape

print(dense_layer.input_shape)
(None, 2)

output from dense layer will be of the shape

print(dense_layer.output_shape)
(None, 5)

dense layer ususally has two variables i.e. weight/kernel and bias. As we did not use bias thus no bias is shown

print(dense_layer.weights)
[<tf.Variable 'my_output/kernel:0' shape=(2, 5) dtype=float32, numpy=
array([[ 0.0517453 ,  0.77041924, -0.0192523 , -0.7022766 ,  0.37126076],
       [-0.3371734 ,  0.04741824, -0.252154  ,  0.7318406 , -0.25318795]], dtype=float32)>]

The shape of the dense weights is of the form (input_size, units) dense_layer.weights returns a list, the first variable of which kernel/weights. We can convert a numpy version of weights

dense_w = dense_layer.weights[0].numpy()
print(dense_w.shape)
(2, 5)
print(dense_w)
[[ 0.0517453   0.77041924 -0.0192523  -0.7022766   0.37126076]
 [-0.3371734   0.04741824 -0.252154    0.7318406  -0.25318795]]

The output from our model consisting of a single dense layer is simply the matrix multiplication between input and weight matrix as can be verified from below.

np.matmul(in_np, dense_w)
array([[-24.54515922,  60.22373641, -22.5863533 ,  10.2084204 ,   5.83424747],
       [-23.86583853,  29.79804015, -19.81828165,  31.74248242,  -6.61941862],
       [ -6.83186275,  65.50959873,  -9.91902268, -34.13821661,  22.45944077],
       [  4.24145627,  73.28466427,  -2.33327651, -65.25259459,  34.7633965 ],
       [ -5.96728861,  12.50465333,  -5.33186436,   4.1026634 ,   0.50515234],
       [-29.02361423,   8.74790204, -22.05291116,  59.45647359, -19.79978746],
       [ -2.07810163,  19.73466337,  -3.00284743, -10.23850858,   6.74963951],
       [-19.12202519,  44.68482435, -17.42963374,   9.64670396,   3.59086412],
       [ -8.61127257,   3.54413188,  -6.61376071,  16.92102611,  -5.46910453],
       [-11.41546631,  22.60321248, -10.10166383,   8.84847534,   0.40289831]])

compare above output from the model’s output which was obtained earlier.

Using Bias

By default the Dense layer in tensorflow uses bias as well.

reset_seed()
tf.keras.backend.clear_session()

ins = Input(input_features, name='my_input')
out = Dense(5, use_bias=True,  name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
(10, 5)
[[-24.545158    60.223736   -22.586353    10.208422     5.8342476 ]
 [-23.865839    29.79804    -19.818281    31.742483    -6.619419  ]
 [ -6.831863    65.5096      -9.919023   -34.138214    22.459442  ]
 [  4.241456    73.28466     -2.3332765  -65.252594    34.763397  ]
 [ -5.9672885   12.504654    -5.3318644    4.102663     0.50515246]
 [-29.023615     8.747902   -22.052912    59.456474   -19.799788  ]
 [ -2.0781016   19.734663    -3.0028474  -10.238509     6.7496395 ]
 [-19.122025    44.684822   -17.429634     9.646704     3.5908635 ]
 [ -8.611273     3.544132    -6.613761    16.921026    -5.469105  ]
 [-11.415466    22.603212   -10.101664     8.848475     0.40289795]]
dense_layer = model.get_layer('my_output')
print(dense_layer.weights)
[<tf.Variable 'my_output/kernel:0' shape=(2, 5) dtype=float32, numpy=
array([[ 0.0517453 ,  0.77041924, -0.0192523 , -0.7022766 ,  0.37126076],
       [-0.3371734 ,  0.04741824, -0.252154  ,  0.7318406 , -0.25318795]], dtype=float32)>, <tf.Variable 'my_output/bias:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>]

The bias vector above was all zeros thus had no effect on model output as the equation for dense layer becomes $$ y = Ax + b$$ We can initialize bias vector with ones and see the output

reset_seed()

ins = Input(input_features, name='my_input')
out = Dense(dense_units, use_bias=True, bias_initializer='ones', name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
(10, 5)
[[-23.545158   61.223736  -21.586353   11.208422    6.8342476]
 [-22.865839   30.79804   -18.818281   32.742485   -5.619419 ]
 [ -5.831863   66.5096     -8.919023  -33.138214   23.459442 ]
 [  5.241456   74.28466    -1.3332765 -64.252594   35.763397 ]
 [ -4.9672885  13.504654   -4.3318644   5.102663    1.5051525]
 [-28.023615    9.747902  -21.052912   60.456474  -18.799788 ]
 [ -1.0781016  20.734663   -2.0028474  -9.238509    7.7496395]
 [-18.122025   45.684822  -16.429634   10.646704    4.590863 ]
 [ -7.611273    4.544132   -5.613761   17.921026   -4.469105 ]
 [-10.415466   23.603212   -9.101664    9.848475    1.402898 ]]
dense_layer = model.get_layer('my_output')
print(dense_layer.weights)
[<tf.Variable 'my_output/kernel:0' shape=(2, 5) dtype=float32, numpy=
array([[ 0.0517453 ,  0.77041924, -0.0192523 , -0.7022766 ,  0.37126076],
       [-0.3371734 ,  0.04741824, -0.252154  ,  0.7318406 , -0.25318795]], dtype=float32)>, <tf.Variable 'my_output/bias:0' shape=(5,) dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>]

We can verify that the model’s output is obtained following the equation we wrote above.

dense_layer = model.get_layer('my_output')
dense_w = dense_layer.weights[0].numpy()
np.matmul(in_np, dense_w) + np.ones(dense_units)
array([[-23.54515922,  61.22373641, -21.5863533 ,  11.2084204 ,   6.83424747],
       [-22.86583853,  30.79804015, -18.81828165,  32.74248242,  -5.61941862],
       [ -5.83186275,  66.50959873,  -8.91902268, -33.13821661,  23.45944077],
       [  5.24145627,  74.28466427,  -1.33327651, -64.25259459,  35.7633965 ],
       [ -4.96728861,  13.50465333,  -4.33186436,   5.1026634 ,   1.50515234],
       [-28.02361423,   9.74790204, -21.05291116,  60.45647359, -18.79978746],
       [ -1.07810163,  20.73466337,  -2.00284743,  -9.23850858,   7.74963951],
       [-18.12202519,  45.68482435, -16.42963374,  10.64670396,   4.59086412],
       [ -7.61127257,   4.54413188,  -5.61376071,  17.92102611,  -4.46910453],
       [-10.41546631,  23.60321248,  -9.10166383,   9.84847534,   1.40289831]])

using activation function

We can add non-linearity to the output of dense layer by making use of activation keyword argument. A common activation function is relu which makes all the values below 0 as zero. In this case the equation of dense layer will become $$ y = alpha (Ax + b) $$ Where $alpha$ is the non-linearity applied.

reset_seed()

ins = Input(input_features, name='my_input')
out = Dense(dense_units, use_bias=True, bias_initializer='ones',
            activation='relu', name='my_output')(ins)
model = Model(inputs=ins, outputs=out)

out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
(10, 5)
[[ 0.        61.223736   0.        11.208422   6.8342476]
 [ 0.        30.79804    0.        32.742485   0.       ]
 [ 0.        66.5096     0.         0.        23.459442 ]
 [ 5.241456  74.28466    0.         0.        35.763397 ]
 [ 0.        13.504654   0.         5.102663   1.5051525]
 [ 0.         9.747902   0.        60.456474   0.       ]
 [ 0.        20.734663   0.         0.         7.7496395]
 [ 0.        45.684822   0.        10.646704   4.590863 ]
 [ 0.         4.544132   0.        17.921026   0.       ]
 [ 0.        23.603212   0.         9.848475   1.402898 ]]

We can again verify that the above output from dense layer follows the equation that we wrote above.

def relu(X):
   return np.maximum(0,X)


dense_layer = model.get_layer('my_output')
dense_w = dense_layer.weights[0].numpy()
relu(np.matmul(in_np, dense_w) + np.ones(dense_units))
array([[ 0.        , 61.22373641,  0.        , 11.2084204 ,  6.83424747],
       [ 0.        , 30.79804015,  0.        , 32.74248242,  0.        ],
       [ 0.        , 66.50959873,  0.        ,  0.        , 23.45944077],
       [ 5.24145627, 74.28466427,  0.        ,  0.        , 35.7633965 ],
       [ 0.        , 13.50465333,  0.        ,  5.1026634 ,  1.50515234],
       [ 0.        ,  9.74790204,  0.        , 60.45647359,  0.        ],
       [ 0.        , 20.73466337,  0.        ,  0.        ,  7.74963951],
       [ 0.        , 45.68482435,  0.        , 10.64670396,  4.59086412],
       [ 0.        ,  4.54413188,  0.        , 17.92102611,  0.        ],
       [ 0.        , 23.60321248,  0.        ,  9.84847534,  1.40289831]])

customizing weights

we can set the weights and bias of dense layer to values of our choice. This is useful for example when we want to initialize the weights/bias with the values that we already have.

custom_dense_weights = np.array([[1, 2, 3 , 4,  5],
                                 [6, 7, 8 , 9 , 10]], dtype=np.float32)
custom_bias = np.array([0., 0., 0., 0., 0.])

reset_seed()

ins = Input(input_features, name='my_input')

dense_lyr = Dense(dense_units, use_bias=True, bias_initializer='ones', name='my_output')
out = dense_lyr(ins)

model = Model(inputs=ins, outputs=out)

dense_lyr.set_weights([custom_dense_weights, custom_bias])

The method set_weights must be called after initializing Model class. The input to set_weights is a list containing both weight matrix and bias vector respectively.

out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7062db7f75e0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
(10, 5)
[[ 577.  734.  891. 1048. 1205.]
 [ 490.  600.  710.  820.  930.]
 [ 281.  397.  513.  629.  745.]
 [ 107.  204.  301.  398.  495.]
 [ 135.  170.  205.  240.  275.]
 [ 528.  621.  714.  807.  900.]
 [  85.  120.  155.  190.  225.]
 [ 444.  563.  682.  801.  920.]
 [ 159.  188.  217.  246.  275.]
 [ 255.  320.  385.  450.  515.]]
dense_layer = model.get_layer('my_output')
dense_w = dense_layer.weights[0].numpy()
print(dense_w)
[[ 1.  2.  3.  4.  5.]
 [ 6.  7.  8.  9. 10.]]

Verify that the output from dense is just matrix multiplication.

np.matmul(in_np, custom_dense_weights) + np.zeros(dense_units)
array([[ 577.,  734.,  891., 1048., 1205.],
       [ 490.,  600.,  710.,  820.,  930.],
       [ 281.,  397.,  513.,  629.,  745.],
       [ 107.,  204.,  301.,  398.,  495.],
       [ 135.,  170.,  205.,  240.,  275.],
       [ 528.,  621.,  714.,  807.,  900.],
       [  85.,  120.,  155.,  190.,  225.],
       [ 444.,  563.,  682.,  801.,  920.],
       [ 159.,  188.,  217.,  246.,  275.],
       [ 255.,  320.,  385.,  450.,  515.]])

Reducing Dimensions

Dense layer can be used to reduce last dimension of incoming input. In following the size is reduced from (10, 20, 30) ==> (10, 20, 1)

input_shape = 20, 30
in_np = np.random.randint(0, 100, size=(batch_size,*input_shape))

reset_seed()


ins = Input(input_shape, name='my_input')
out = Dense(1, use_bias=False, name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print('input shape: {}\n output shape: {}'.format(in_np.shape, out_np.shape))
WARNING:tensorflow:6 out of the last 6 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7062e883e430> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
input shape: (10, 20, 30)
 output shape: (10, 20, 1)

Total running time of the script: (0 minutes 1.733 seconds)

Gallery generated by Sphinx-Gallery