.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/input_output_lstm.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_input_output_lstm.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_input_output_lstm.py:


==================================
understanding Input/output of LSTM
==================================

The purpose of this notebook to determine the input and output shapes of LSTM
in keras/tensorflow. It also shows how the output changes when we use different
options such as ``return_sequences`` and ``return_state``
arguments in LSTM/RNN layers of tensorflow/keras.

.. GENERATED FROM PYTHON SOURCE LINES 11-29

.. code-block:: Python


    import numpy as np
    import tensorflow as tf
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import MaxPooling1D, Flatten, Conv1D
    from tensorflow.keras.layers import Input, LSTM, Reshape, TimeDistributed

    # to suppress scientific notation while printing arrays
    np.set_printoptions(suppress=True)

    def reset_graph(seed=313):
        tf.compat.v1.reset_default_graph()
        tf.compat.v1.set_random_seed(seed)
        np.random.seed(seed)

    tf.__version__


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    '2.7.0'


.. GENERATED FROM PYTHON SOURCE LINES 30-41

.. code-block:: Python


    seq_len = 9
    in_features = 3
    batch_size = 2
    units = 5

    # define input data
    data = np.random.normal(0,1, size=(batch_size, seq_len, in_features))
    print('input shape is', data.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    input shape is (2, 9, 3)


.. GENERATED FROM PYTHON SOURCE LINES 42-45

.. code-block:: Python


    reset_graph()


.. GENERATED FROM PYTHON SOURCE LINES 46-48

Input to LSTM
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 48-60

.. code-block:: Python


    # The input to LSTM is 3D where each dimension is expected to have following meaning
    # (batch_size, sequence_length, num_inputs)
    # the batch_size determines the number of samples, sequence_legth determines the length
    # of historical/temporal data used by LSTM and num_inputs is the number of input features

    # define model
    inputs1 = Input(shape=(seq_len, in_features))
    lstm1 = LSTM(units)(inputs1)
    model = Model(inputs=inputs1, outputs=lstm1)
    model.inputs


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    [<KerasTensor: shape=(None, 9, 3) dtype=float32 (created by layer 'input_2')>]


.. GENERATED FROM PYTHON SOURCE LINES 61-63

Output from LSTM
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 63-73

.. code-block:: Python


    # In Keras, the output from LSTM is 2D and each dimension has following meaning
    # (batch_size, units)
    # the units here represents the number of units/neuron of LSTM layer.

    # check output
    output = model.predict(data)
    print('output shape is ', output.shape)
    print(output)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    output shape is  (2, 5)
    [[-0.04311746 -0.04708175  0.11244525  0.09445497  0.08160033]
     [ 0.22174549  0.23136306 -0.1471001   0.04506844 -0.0963508 ]]


.. GENERATED FROM PYTHON SOURCE LINES 74-78

Return Sequence
------------------------------
 If we use ``return_sequences=True``, we can get hidden state which is also output,
 at each time step instead of just one final output.

.. GENERATED FROM PYTHON SOURCE LINES 80-95

.. code-block:: Python


    reset_graph()

    print('input shape is', data.shape)

    # define model
    inputs1 = Input(shape=(seq_len, in_features))
    lstm1 = LSTM(units, return_sequences=True)(inputs1)
    model = Model(inputs=inputs1, outputs=lstm1)

    # check output
    output = model.predict(data)
    print('output shape is ', output.shape)
    print(output)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    input shape is (2, 9, 3)
    output shape is  (2, 9, 5)
    [[[ 0.23949696  0.23758332  0.0201166  -0.07562752  0.14458913]
      [ 0.20123877  0.19533847  0.04180209 -0.12905313  0.20505369]
      [ 0.06623977  0.09107485  0.02961113 -0.06149743  0.07921001]
      [ 0.103291    0.14202026 -0.10353918 -0.13593747 -0.01541394]
      [ 0.11871371  0.11363701  0.01490535 -0.01338429  0.09110813]
      [ 0.18314067  0.17522626  0.04663869 -0.05388878  0.18176244]
      [ 0.31485227  0.24940978  0.0693886  -0.03106552  0.25046384]
      [ 0.17771643  0.09009738  0.16493434  0.06166327  0.21880664]
      [-0.04311746 -0.04708175  0.11244525  0.09445497  0.08160033]]

     [[ 0.0236822   0.057854    0.05342087 -0.10365748  0.14504817]
      [-0.03983979  0.04184275  0.13498983  0.14183497  0.11871135]
      [-0.08096419  0.02722256  0.16430669  0.19353093  0.18122804]
      [-0.10457274 -0.09090691  0.05876469  0.26642254 -0.02051181]
      [ 0.07231079  0.07811436  0.06489968  0.07280337  0.08751098]
      [-0.02732764  0.00174761  0.04222624 -0.02587408  0.02410888]
      [ 0.02454332  0.01909897 -0.09221498 -0.07524213 -0.09897806]
      [ 0.22740148  0.31498346 -0.19642149 -0.16686526 -0.2563934 ]
      [ 0.22174549  0.23136306 -0.1471001   0.04506844 -0.0963508 ]]]


.. GENERATED FROM PYTHON SOURCE LINES 96-99

Return States
--------------------
 If we use ``return_state=True``, it will give final hidden state/output plus the cell state as well

.. GENERATED FROM PYTHON SOURCE LINES 101-115

.. code-block:: Python


    reset_graph()

    # define model
    inputs1 = Input(shape=(seq_len, in_features))
    lstm1, state_h, state_c = LSTM(units, return_state=True)(inputs1)
    model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

    # check output
    _h, h, c = model.predict(data)
    print('_h: shape {} values \n {}\n'.format(_h.shape, _h))
    print('h: shape {} values \n {}\n'.format(h.shape, h))
    print('c: shape {} values \n {}'.format(c.shape, c))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    _h: shape (2, 5) values 
     [[-0.04311746 -0.04708175  0.11244525  0.09445497  0.08160033]
     [ 0.22174549  0.23136306 -0.1471001   0.04506844 -0.0963508 ]]

    h: shape (2, 5) values 
     [[-0.04311746 -0.04708175  0.11244525  0.09445497  0.08160033]
     [ 0.22174549  0.23136306 -0.1471001   0.04506844 -0.0963508 ]]

    c: shape (2, 5) values 
     [[-0.0884207  -0.10446949  0.1710459   0.17895043  0.24443825]
     [ 0.3913621   0.40256596 -0.38461903  0.08493438 -0.22778362]]


.. GENERATED FROM PYTHON SOURCE LINES 116-118

using both at same time
We can use both ``return_sequences`` and ``return_states`` at same time as well.

.. GENERATED FROM PYTHON SOURCE LINES 120-134

.. code-block:: Python


    reset_graph()

    # define model
    inputs1 = Input(shape=(seq_len, in_features))
    lstm1, state_h, state_c = LSTM(units, return_state=True, return_sequences=True)(inputs1)
    model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

    # check output
    _h, h, c = model.predict(data)
    print('_h: shape {} values \n {}\n'.format(_h.shape, _h))
    print('h: shape {} values \n {}\n'.format(h.shape, h))
    print('c: shape {} values \n {}'.format(c.shape, c))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    _h: shape (2, 9, 5) values 
     [[[ 0.23949696  0.23758332  0.0201166  -0.07562752  0.14458913]
      [ 0.20123877  0.19533847  0.04180209 -0.12905313  0.20505369]
      [ 0.06623977  0.09107485  0.02961113 -0.06149743  0.07921001]
      [ 0.103291    0.14202026 -0.10353918 -0.13593747 -0.01541394]
      [ 0.11871371  0.11363701  0.01490535 -0.01338429  0.09110813]
      [ 0.18314067  0.17522626  0.04663869 -0.05388878  0.18176244]
      [ 0.31485227  0.24940978  0.0693886  -0.03106552  0.25046384]
      [ 0.17771643  0.09009738  0.16493434  0.06166327  0.21880664]
      [-0.04311746 -0.04708175  0.11244525  0.09445497  0.08160033]]

     [[ 0.0236822   0.057854    0.05342087 -0.10365748  0.14504817]
      [-0.03983979  0.04184275  0.13498983  0.14183497  0.11871135]
      [-0.08096419  0.02722256  0.16430669  0.19353093  0.18122804]
      [-0.10457274 -0.09090691  0.05876469  0.26642254 -0.02051181]
      [ 0.07231079  0.07811436  0.06489968  0.07280337  0.08751098]
      [-0.02732764  0.00174761  0.04222624 -0.02587408  0.02410888]
      [ 0.02454332  0.01909897 -0.09221498 -0.07524213 -0.09897806]
      [ 0.22740148  0.31498346 -0.19642149 -0.16686526 -0.2563934 ]
      [ 0.22174549  0.23136306 -0.1471001   0.04506844 -0.0963508 ]]]

    h: shape (2, 5) values 
     [[-0.04311746 -0.04708175  0.11244525  0.09445497  0.08160033]
     [ 0.22174549  0.23136306 -0.1471001   0.04506844 -0.0963508 ]]

    c: shape (2, 5) values 
     [[-0.0884207  -0.10446949  0.1710459   0.17895043  0.24443825]
     [ 0.3913621   0.40256596 -0.38461903  0.08493438 -0.22778362]]


.. GENERATED FROM PYTHON SOURCE LINES 135-140

time major
--------------------
 By ``time_major`` we mean that the last dimention i.e. 3rd dimension represents time
 and the second last represents input features. Thus the 3D input to lstm will become
 (batch_size, num_inputs, sequence_length)

.. GENERATED FROM PYTHON SOURCE LINES 140-149

.. code-block:: Python


    reset_graph()

    # define model
    inputs1 = Input(shape=(in_features, seq_len))
    lstm1 = LSTM(units, time_major=True)(inputs1)
    model = Model(inputs=inputs1, outputs=[lstm1])
    model.inputs


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    [<KerasTensor: shape=(None, 3, 9) dtype=float32 (created by layer 'input_6')>]


.. GENERATED FROM PYTHON SOURCE LINES 150-156

.. code-block:: Python


    # we will have to shift the dimensions of numpy array to make it time_major
    # check output
    time_major_data = np.moveaxis(data, [1,2], [2,1])
    time_major_data.shape


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (2, 3, 9)


.. GENERATED FROM PYTHON SOURCE LINES 157-161

.. code-block:: Python


    h = model.predict(time_major_data)
    print('h: shape {} values \n {}\n'.format(h.shape, h))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    h: shape (3, 5) values 
     [[ 0.0856159   0.06631077 -0.43855685  0.1004677  -0.40924817]
     [ 0.02948599  0.02146549  0.01565967 -0.10389965  0.27761555]
     [ 0.09459803  0.14054263  0.1562092  -0.11277693 -0.12558709]]


.. GENERATED FROM PYTHON SOURCE LINES 162-167

CNN -> LSTM
------------------------
 We can append LSTM with any other layer. The only requirement is that the output
 from that layer should match the input requirement of LSTM i.e. the output from the
 layer that we want to add before LSTM should be 3D of shape (batch_size, num_inputs, seq_length)

.. GENERATED FROM PYTHON SOURCE LINES 167-176

.. code-block:: Python


    reset_graph()

    # define model
    inputs = Input(shape=(seq_len, in_features))
    cnn = Conv1D(filters=2, kernel_size=2, padding="same")(inputs)
    max_pool = MaxPooling1D(padding="same")(cnn)
    max_pool


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <KerasTensor: shape=(None, 5, 2) dtype=float32 (created by layer 'max_pooling1d')>


.. GENERATED FROM PYTHON SOURCE LINES 177-179

as the shape of ``max_pool`` tensor matches the input requirement of LSTM we
can combine it with LSTM

.. GENERATED FROM PYTHON SOURCE LINES 179-184

.. code-block:: Python


    h = LSTM(units)(max_pool)
    model = Model(inputs=inputs, outputs=h)
    model.summary()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Model: "model"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     input_7 (InputLayer)        [(None, 9, 3)]            0         
                                                                 
     conv1d (Conv1D)             (None, 9, 2)              14        
                                                                 
     max_pooling1d (MaxPooling1D  (None, 5, 2)             0         
     )                                                               
                                                                 
     lstm (LSTM)                 (None, 5)                 160       
                                                                 
    =================================================================
    Total params: 174
    Trainable params: 174
    Non-trainable params: 0
    _________________________________________________________________


.. GENERATED FROM PYTHON SOURCE LINES 185-188

However, this is not how CNN is comined with LSTM at its start. The purpose is
usually to break the sequence length into small sub-sequences and then apply the
**same** CNN on those sub-sequences. We can achieve this as following

.. GENERATED FROM PYTHON SOURCE LINES 188-201

.. code-block:: Python


    sub_sequences = 3

    reset_graph()
    # define model
    inputs = Input(shape=(seq_len, in_features))
    time_steps = seq_len // sub_sequences
    reshape = Reshape(target_shape=(sub_sequences, time_steps, in_features))(inputs)
    cnn = TimeDistributed(Conv1D(filters=2, kernel_size=2, padding="same"))(reshape)
    max_pool = TimeDistributed(MaxPooling1D(padding="same"))(cnn)
    flatten = TimeDistributed(Flatten())(max_pool)
    flatten


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <KerasTensor: shape=(None, 3, 4) dtype=float32 (created by layer 'time_distributed_2')>


.. GENERATED FROM PYTHON SOURCE LINES 202-204

the shape of ``flatten`` tensor again matches the input requirements of LSTM so
we can again attach LSTM after it.

.. GENERATED FROM PYTHON SOURCE LINES 204-209

.. code-block:: Python


    h = LSTM(units)(flatten)
    model = Model(inputs=inputs, outputs=h)
    model.summary()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Model: "model"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     input_8 (InputLayer)        [(None, 9, 3)]            0         
                                                                 
     reshape (Reshape)           (None, 3, 3, 3)           0         
                                                                 
     time_distributed (TimeDistr  (None, 3, 3, 2)          14        
     ibuted)                                                         
                                                                 
     time_distributed_1 (TimeDis  (None, 3, 2, 2)          0         
     tributed)                                                       
                                                                 
     time_distributed_2 (TimeDis  (None, 3, 4)             0         
     tributed)                                                       
                                                                 
     lstm (LSTM)                 (None, 5)                 200       
                                                                 
    =================================================================
    Total params: 214
    Trainable params: 214
    Non-trainable params: 0
    _________________________________________________________________


.. GENERATED FROM PYTHON SOURCE LINES 210-213

LSTM -> 1D CNN
------------------------
 We can put 1d cnn at the end of LSTM to further extract some features from LSTM output.

.. GENERATED FROM PYTHON SOURCE LINES 215-241

.. code-block:: Python


    reset_graph()

    print('input shape is', data.shape)

    # define model
    inputs = Input(shape=(seq_len, in_features))
    lstm_layer = LSTM(units, return_sequences=True)
    lstm_outputs = lstm_layer(inputs)
    print('lstm output: ', lstm_outputs.shape)

    conv1 = Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(seq_len, units))(lstm_outputs)
    print('conv output: ', conv1.shape)

    max1d1 = MaxPooling1D(pool_size=2)(conv1)
    print('max pool output: ', max1d1.shape)

    flat1 = Flatten()(max1d1)
    print('flatten output: ', flat1.shape)

    model = Model(inputs=inputs, outputs=flat1)

    # check output
    output = model.predict(data)
    print('output shape: ', output.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    input shape is (2, 9, 3)
    lstm output:  (None, 9, 5)
    conv output:  (None, 8, 64)
    max pool output:  (None, 4, 64)
    flatten output:  (None, 256)
    output shape:  (2, 256)


.. GENERATED FROM PYTHON SOURCE LINES 242-244

The output from LSTM/RNN looks roughly as below.
$$ h_t = tanh(b + Wh_{t-1} + UX_t) $$

.. GENERATED FROM PYTHON SOURCE LINES 247-248

weights of our input against every neuron in LSTM

.. GENERATED FROM PYTHON SOURCE LINES 248-251

.. code-block:: Python


    print('kernel U: ', lstm_layer.get_weights()[0].shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    kernel U:  (3, 20)


.. GENERATED FROM PYTHON SOURCE LINES 252-254

weights of our hidden state a.k.a the output of LSTM in the
previous timestep (t-1) against every neuron in LSTM

.. GENERATED FROM PYTHON SOURCE LINES 254-257

.. code-block:: Python


    print('recurrent kernel, W: ', lstm_layer.get_weights()[1].shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    recurrent kernel, W:  (5, 20)


.. GENERATED FROM PYTHON SOURCE LINES 258-260

.. code-block:: Python

    print('bias: ', lstm_layer.get_weights()[2].shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    bias:  (20,)


.. GENERATED FROM PYTHON SOURCE LINES 261-261

This post is inspired from Jason Brownlee's [page](https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 2.115 seconds)


.. _sphx_glr_download_auto_examples_input_output_lstm.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/sphinx-gallery/sphinx-gallery.github.io/master?urlpath=lab/tree/notebooks/auto_examples/input_output_lstm.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: input_output_lstm.ipynb <input_output_lstm.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: input_output_lstm.py <input_output_lstm.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: input_output_lstm.zip <input_output_lstm.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_