Note
Go to the end to download the full example code. or to run this example in your browser via Binder
understanding Dense layer in Keras
This notebook describes dense layer or fully connected layer using tensorflow.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
def reset_seed(seed=313):
tf.keras.backend.clear_session()
tf.random.set_seed(seed)
np.random.seed(seed)
np.set_printoptions(linewidth=100, suppress=True)
print(tf.__version__)
2.7.0
print(np.__version__)
1.21.6
set some global parameters
input_features = 2
batch_size = 10
dense_units = 5
define input to model
in_np = np.random.randint(0, 100, size=(batch_size,input_features))
print(in_np)
[[73 84]
[34 76]
[83 33]
[95 2]
[15 20]
[ 6 87]
[25 10]
[54 65]
[ 3 26]
[27 38]]
build a model consisting of single dense layer
reset_seed()
ins = Input(input_features, name='my_input')
out = Dense(dense_units, use_bias=False, name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print(out_np)
[[-24.545158 60.223736 -22.586353 10.208422 5.8342476 ]
[-23.865839 29.79804 -19.818281 31.742483 -6.619419 ]
[ -6.831863 65.5096 -9.919023 -34.138214 22.459442 ]
[ 4.241456 73.28466 -2.3332765 -65.252594 34.763397 ]
[ -5.9672885 12.504654 -5.3318644 4.102663 0.50515246]
[-29.023615 8.747902 -22.052912 59.456474 -19.799788 ]
[ -2.0781016 19.734663 -3.0028474 -10.238509 6.7496395 ]
[-19.122025 44.684822 -17.429634 9.646704 3.5908635 ]
[ -8.611273 3.544132 -6.613761 16.921026 -5.469105 ]
[-11.415466 22.603212 -10.101664 8.848475 0.40289795]]
print(out_np.shape)
(10, 5)
We can get all layers of model as list
print(model.layers)
[<keras.engine.input_layer.InputLayer object at 0x70633c3aa0a0>, <keras.layers.core.dense.Dense object at 0x7062ea5161f0>]
or a specific layer by its name
dense_layer = model.get_layer('my_output')
input to dense layer must be of the shape
print(dense_layer.input_shape)
(None, 2)
output from dense layer will be of the shape
print(dense_layer.output_shape)
(None, 5)
dense layer ususally has two variables i.e. weight/kernel and bias. As we did not use bias thus no bias is shown
print(dense_layer.weights)
[<tf.Variable 'my_output/kernel:0' shape=(2, 5) dtype=float32, numpy=
array([[ 0.0517453 , 0.77041924, -0.0192523 , -0.7022766 , 0.37126076],
[-0.3371734 , 0.04741824, -0.252154 , 0.7318406 , -0.25318795]], dtype=float32)>]
The shape of the dense weights is of the form (input_size, units) dense_layer.weights returns a list, the first variable of which kernel/weights. We can convert a numpy version of weights
dense_w = dense_layer.weights[0].numpy()
print(dense_w.shape)
(2, 5)
print(dense_w)
[[ 0.0517453 0.77041924 -0.0192523 -0.7022766 0.37126076]
[-0.3371734 0.04741824 -0.252154 0.7318406 -0.25318795]]
The output from our model consisting of a single dense layer is simply the matrix multiplication between input and weight matrix as can be verified from below.
np.matmul(in_np, dense_w)
array([[-24.54515922, 60.22373641, -22.5863533 , 10.2084204 , 5.83424747],
[-23.86583853, 29.79804015, -19.81828165, 31.74248242, -6.61941862],
[ -6.83186275, 65.50959873, -9.91902268, -34.13821661, 22.45944077],
[ 4.24145627, 73.28466427, -2.33327651, -65.25259459, 34.7633965 ],
[ -5.96728861, 12.50465333, -5.33186436, 4.1026634 , 0.50515234],
[-29.02361423, 8.74790204, -22.05291116, 59.45647359, -19.79978746],
[ -2.07810163, 19.73466337, -3.00284743, -10.23850858, 6.74963951],
[-19.12202519, 44.68482435, -17.42963374, 9.64670396, 3.59086412],
[ -8.61127257, 3.54413188, -6.61376071, 16.92102611, -5.46910453],
[-11.41546631, 22.60321248, -10.10166383, 8.84847534, 0.40289831]])
compare above output from the model’s output which was obtained earlier.
Using Bias
By default the Dense layer in tensorflow uses bias as well.
reset_seed()
tf.keras.backend.clear_session()
ins = Input(input_features, name='my_input')
out = Dense(5, use_bias=True, name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
(10, 5)
[[-24.545158 60.223736 -22.586353 10.208422 5.8342476 ]
[-23.865839 29.79804 -19.818281 31.742483 -6.619419 ]
[ -6.831863 65.5096 -9.919023 -34.138214 22.459442 ]
[ 4.241456 73.28466 -2.3332765 -65.252594 34.763397 ]
[ -5.9672885 12.504654 -5.3318644 4.102663 0.50515246]
[-29.023615 8.747902 -22.052912 59.456474 -19.799788 ]
[ -2.0781016 19.734663 -3.0028474 -10.238509 6.7496395 ]
[-19.122025 44.684822 -17.429634 9.646704 3.5908635 ]
[ -8.611273 3.544132 -6.613761 16.921026 -5.469105 ]
[-11.415466 22.603212 -10.101664 8.848475 0.40289795]]
dense_layer = model.get_layer('my_output')
print(dense_layer.weights)
[<tf.Variable 'my_output/kernel:0' shape=(2, 5) dtype=float32, numpy=
array([[ 0.0517453 , 0.77041924, -0.0192523 , -0.7022766 , 0.37126076],
[-0.3371734 , 0.04741824, -0.252154 , 0.7318406 , -0.25318795]], dtype=float32)>, <tf.Variable 'my_output/bias:0' shape=(5,) dtype=float32, numpy=array([0., 0., 0., 0., 0.], dtype=float32)>]
The bias vector above was all zeros thus had no effect on model output as the equation for dense layer becomes $$ y = Ax + b$$ We can initialize bias vector with ones and see the output
reset_seed()
ins = Input(input_features, name='my_input')
out = Dense(dense_units, use_bias=True, bias_initializer='ones', name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
(10, 5)
[[-23.545158 61.223736 -21.586353 11.208422 6.8342476]
[-22.865839 30.79804 -18.818281 32.742485 -5.619419 ]
[ -5.831863 66.5096 -8.919023 -33.138214 23.459442 ]
[ 5.241456 74.28466 -1.3332765 -64.252594 35.763397 ]
[ -4.9672885 13.504654 -4.3318644 5.102663 1.5051525]
[-28.023615 9.747902 -21.052912 60.456474 -18.799788 ]
[ -1.0781016 20.734663 -2.0028474 -9.238509 7.7496395]
[-18.122025 45.684822 -16.429634 10.646704 4.590863 ]
[ -7.611273 4.544132 -5.613761 17.921026 -4.469105 ]
[-10.415466 23.603212 -9.101664 9.848475 1.402898 ]]
dense_layer = model.get_layer('my_output')
print(dense_layer.weights)
[<tf.Variable 'my_output/kernel:0' shape=(2, 5) dtype=float32, numpy=
array([[ 0.0517453 , 0.77041924, -0.0192523 , -0.7022766 , 0.37126076],
[-0.3371734 , 0.04741824, -0.252154 , 0.7318406 , -0.25318795]], dtype=float32)>, <tf.Variable 'my_output/bias:0' shape=(5,) dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>]
We can verify that the model’s output is obtained following the equation we wrote above.
dense_layer = model.get_layer('my_output')
dense_w = dense_layer.weights[0].numpy()
np.matmul(in_np, dense_w) + np.ones(dense_units)
array([[-23.54515922, 61.22373641, -21.5863533 , 11.2084204 , 6.83424747],
[-22.86583853, 30.79804015, -18.81828165, 32.74248242, -5.61941862],
[ -5.83186275, 66.50959873, -8.91902268, -33.13821661, 23.45944077],
[ 5.24145627, 74.28466427, -1.33327651, -64.25259459, 35.7633965 ],
[ -4.96728861, 13.50465333, -4.33186436, 5.1026634 , 1.50515234],
[-28.02361423, 9.74790204, -21.05291116, 60.45647359, -18.79978746],
[ -1.07810163, 20.73466337, -2.00284743, -9.23850858, 7.74963951],
[-18.12202519, 45.68482435, -16.42963374, 10.64670396, 4.59086412],
[ -7.61127257, 4.54413188, -5.61376071, 17.92102611, -4.46910453],
[-10.41546631, 23.60321248, -9.10166383, 9.84847534, 1.40289831]])
using activation function
We can add non-linearity to the output of dense layer by making use of activation keyword argument. A common activation function is relu which makes all the values below 0 as zero. In this case the equation of dense layer will become $$ y = alpha (Ax + b) $$ Where $alpha$ is the non-linearity applied.
reset_seed()
ins = Input(input_features, name='my_input')
out = Dense(dense_units, use_bias=True, bias_initializer='ones',
activation='relu', name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
(10, 5)
[[ 0. 61.223736 0. 11.208422 6.8342476]
[ 0. 30.79804 0. 32.742485 0. ]
[ 0. 66.5096 0. 0. 23.459442 ]
[ 5.241456 74.28466 0. 0. 35.763397 ]
[ 0. 13.504654 0. 5.102663 1.5051525]
[ 0. 9.747902 0. 60.456474 0. ]
[ 0. 20.734663 0. 0. 7.7496395]
[ 0. 45.684822 0. 10.646704 4.590863 ]
[ 0. 4.544132 0. 17.921026 0. ]
[ 0. 23.603212 0. 9.848475 1.402898 ]]
We can again verify that the above output from dense layer follows the equation that we wrote above.
def relu(X):
return np.maximum(0,X)
dense_layer = model.get_layer('my_output')
dense_w = dense_layer.weights[0].numpy()
relu(np.matmul(in_np, dense_w) + np.ones(dense_units))
array([[ 0. , 61.22373641, 0. , 11.2084204 , 6.83424747],
[ 0. , 30.79804015, 0. , 32.74248242, 0. ],
[ 0. , 66.50959873, 0. , 0. , 23.45944077],
[ 5.24145627, 74.28466427, 0. , 0. , 35.7633965 ],
[ 0. , 13.50465333, 0. , 5.1026634 , 1.50515234],
[ 0. , 9.74790204, 0. , 60.45647359, 0. ],
[ 0. , 20.73466337, 0. , 0. , 7.74963951],
[ 0. , 45.68482435, 0. , 10.64670396, 4.59086412],
[ 0. , 4.54413188, 0. , 17.92102611, 0. ],
[ 0. , 23.60321248, 0. , 9.84847534, 1.40289831]])
customizing weights
we can set the weights and bias of dense layer to values of our choice. This is useful for example when we want to initialize the weights/bias with the values that we already have.
custom_dense_weights = np.array([[1, 2, 3 , 4, 5],
[6, 7, 8 , 9 , 10]], dtype=np.float32)
custom_bias = np.array([0., 0., 0., 0., 0.])
reset_seed()
ins = Input(input_features, name='my_input')
dense_lyr = Dense(dense_units, use_bias=True, bias_initializer='ones', name='my_output')
out = dense_lyr(ins)
model = Model(inputs=ins, outputs=out)
dense_lyr.set_weights([custom_dense_weights, custom_bias])
The method set_weights must be called after initializing Model class. The input to set_weights is a list containing both weight matrix and bias vector respectively.
out_np = model.predict(in_np)
print(out_np.shape)
print(out_np)
WARNING:tensorflow:5 out of the last 5 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7062db7f75e0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
(10, 5)
[[ 577. 734. 891. 1048. 1205.]
[ 490. 600. 710. 820. 930.]
[ 281. 397. 513. 629. 745.]
[ 107. 204. 301. 398. 495.]
[ 135. 170. 205. 240. 275.]
[ 528. 621. 714. 807. 900.]
[ 85. 120. 155. 190. 225.]
[ 444. 563. 682. 801. 920.]
[ 159. 188. 217. 246. 275.]
[ 255. 320. 385. 450. 515.]]
dense_layer = model.get_layer('my_output')
dense_w = dense_layer.weights[0].numpy()
print(dense_w)
[[ 1. 2. 3. 4. 5.]
[ 6. 7. 8. 9. 10.]]
Verify that the output from dense is just matrix multiplication.
np.matmul(in_np, custom_dense_weights) + np.zeros(dense_units)
array([[ 577., 734., 891., 1048., 1205.],
[ 490., 600., 710., 820., 930.],
[ 281., 397., 513., 629., 745.],
[ 107., 204., 301., 398., 495.],
[ 135., 170., 205., 240., 275.],
[ 528., 621., 714., 807., 900.],
[ 85., 120., 155., 190., 225.],
[ 444., 563., 682., 801., 920.],
[ 159., 188., 217., 246., 275.],
[ 255., 320., 385., 450., 515.]])
Reducing Dimensions
Dense layer can be used to reduce last dimension of incoming input. In following the size is reduced from (10, 20, 30) ==> (10, 20, 1)
input_shape = 20, 30
in_np = np.random.randint(0, 100, size=(batch_size,*input_shape))
reset_seed()
ins = Input(input_shape, name='my_input')
out = Dense(1, use_bias=False, name='my_output')(ins)
model = Model(inputs=ins, outputs=out)
out_np = model.predict(in_np)
print('input shape: {}\n output shape: {}'.format(in_np.shape, out_np.shape))
WARNING:tensorflow:6 out of the last 6 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7062e883e430> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
input shape: (10, 20, 30)
output shape: (10, 20, 1)
Total running time of the script: (0 minutes 1.733 seconds)