Data Preparation for Time Series Prediction

This example demonstrates how to prepare data for time series prediction especially for deep learning models/algorithms like LSTM/RNN.

import time
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model

from aqua_fetch import RainfallRunoff

print("tf: ", tf.__version__)
print("np: ", np.__version__)
print('pd: ', pd.__version__)

from utils import prepare_data, prepare_data_sample
tf:  2.7.0
np:  1.21.6
pd:  2.0.3

First we create a simple dataset with 2000 rows and 1 columns i.e. a univariate time series with no covariates.

rows = 2000
cols = 1
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()

Below we print the first 10 rows, the shape of the dataset, and the last 10 rows to give an overview of the data structure.

print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]

 (2000, 1)

[[1990]
 [1991]
 [1992]
 [1993]
 [1994]
 [1995]
 [1996]
 [1997]
 [1998]
 [1999]]
x, _y, y = prepare_data(data, num_inputs=1, num_outputs=1, lookback=4)

print(x.shape, _y.shape, y.shape)
(1997, 4, 1) (1997, 3, 1) (1997, 1, 1)

Checking the first sample/example/data point

x[0]
array([[0],
       [1],
       [2],
       [3]])
_y[0]
array([[0],
       [1],
       [2]])
y[0]
array([[3]])

Checking the second sample/example/data point

x[1]
array([[1],
       [2],
       [3],
       [4]])
_y[1]
array([[1],
       [2],
       [3]])
y[1]
array([[4]])

Now we create another dataset with 2000 rows but with 6 columns i.e. multivariate timeseries. Each column can represent a different feature or variable in the time series data. The dataset is filled with sequential integers for demonstration purposes.

rows = 2000
cols = 6
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[    0  2000  4000  6000  8000 10000]
 [    1  2001  4001  6001  8001 10001]
 [    2  2002  4002  6002  8002 10002]
 [    3  2003  4003  6003  8003 10003]
 [    4  2004  4004  6004  8004 10004]
 [    5  2005  4005  6005  8005 10005]
 [    6  2006  4006  6006  8006 10006]
 [    7  2007  4007  6007  8007 10007]
 [    8  2008  4008  6008  8008 10008]
 [    9  2009  4009  6009  8009 10009]]

 (2000, 6)

[[ 1990  3990  5990  7990  9990 11990]
 [ 1991  3991  5991  7991  9991 11991]
 [ 1992  3992  5992  7992  9992 11992]
 [ 1993  3993  5993  7993  9993 11993]
 [ 1994  3994  5994  7994  9994 11994]
 [ 1995  3995  5995  7995  9995 11995]
 [ 1996  3996  5996  7996  9996 11996]
 [ 1997  3997  5997  7997  9997 11997]
 [ 1998  3998  5998  7998  9998 11998]
 [ 1999  3999  5999  7999  9999 11999]]

If this were a multivariate time series with no covariates then we would use the same approach as before i.e. set the num_inputs equal to that of num_outputs.

x, _y, y = prepare_data(data, num_inputs=6, num_outputs=6, lookback=4)

print(x.shape, _y.shape, y.shape)
(1997, 4, 6) (1997, 3, 6) (1997, 6, 1)

Checking the first sample/example/data point

x[0]
array([[    0,  2000,  4000,  6000,  8000, 10000],
       [    1,  2001,  4001,  6001,  8001, 10001],
       [    2,  2002,  4002,  6002,  8002, 10002],
       [    3,  2003,  4003,  6003,  8003, 10003]])
_y[0]
array([[    0,  2000,  4000,  6000,  8000, 10000],
       [    1,  2001,  4001,  6001,  8001, 10001],
       [    2,  2002,  4002,  6002,  8002, 10002]])
y[0]
array([[    3],
       [ 2003],
       [ 4003],
       [ 6003],
       [ 8003],
       [10003]])

Checking the second sample/example/data point

x[1]
array([[    1,  2001,  4001,  6001,  8001, 10001],
       [    2,  2002,  4002,  6002,  8002, 10002],
       [    3,  2003,  4003,  6003,  8003, 10003],
       [    4,  2004,  4004,  6004,  8004, 10004]])
_y[1]
array([[    1,  2001,  4001,  6001,  8001, 10001],
       [    2,  2002,  4002,  6002,  8002, 10002],
       [    3,  2003,  4003,  6003,  8003, 10003]])
y[1]
array([[    4],
       [ 2004],
       [ 4004],
       [ 6004],
       [ 8004],
       [10004]])

However, if this were a multivariate time series with covariates, i.e. one timeseries column is our target variable and the others are input features, we would need to adjust the data preparation accordingly.

x, _y, y = prepare_data(data, num_inputs=5, lookback=4)

print(x.shape, _y.shape, y.shape)
(1997, 4, 5) (1997, 3, 1) (1997, 1, 1)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
       [10001],
       [10002]])
y[0]
array([[10003]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
       [10002],
       [10003]])
y[1]
array([[10004]])

Consider the case where number of input features/timeseries are 4 and output features/timeseries are 2.

x, _y, y = prepare_data(data, num_inputs=4, lookback=4)

print(x.shape, _y.shape, y.shape)
(1997, 4, 4) (1997, 3, 2) (1997, 2, 1)
x[0]
array([[   0, 2000, 4000, 6000],
       [   1, 2001, 4001, 6001],
       [   2, 2002, 4002, 6002],
       [   3, 2003, 4003, 6003]])
_y[0]
array([[ 8000, 10000],
       [ 8001, 10001],
       [ 8002, 10002]])
y[0]
array([[ 8003],
       [10003]])
x[1]
array([[   1, 2001, 4001, 6001],
       [   2, 2002, 4002, 6002],
       [   3, 2003, 4003, 6003],
       [   4, 2004, 4004, 6004]])
_y[1]
array([[ 8001, 10001],
       [ 8002, 10002],
       [ 8003, 10003]])
y[1]
array([[ 8004],
       [10004]])

nowcasting vs forecasting

If forecast_step is > 0, it means we want to predict in future. It reflects that we are predicting at timestep t = t+1 which effectively means that we feed input data at timestep t and predict the target at timestep t+1.

x, _y, y = prepare_data(data, num_inputs=5, lookback=4, forecast_step=1)

print(x.shape, _y.shape, y.shape)
(1996, 4, 5) (1996, 3, 1) (1996, 1, 1)

First sample

x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
       [10001],
       [10002]])
y[0]
array([[10004]])

Second sample

x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
       [10002],
       [10003]])
y[1]
array([[10005]])

if we want to forecast multiple timesteps in future

x, _y, y = prepare_data(data, num_inputs=5, lookback=4, forecast_step=1, forecast_len=2)

print(x.shape, _y.shape, y.shape)
(1995, 4, 5) (1995, 3, 1) (1995, 1, 2)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
       [10001],
       [10002]])
y[0]
array([[10004, 10005]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
       [10002],
       [10003]])
y[1]
array([[10005, 10006]])

If forecast_step is 0, that means make prediction at t=0 which means we are using input at current timestep to predict the output at current timestep.

x, _y, y = prepare_data(data, num_inputs=5, lookback=4, forecast_step=0, forecast_len=2)

print(x.shape, _y.shape, y.shape)
(1996, 4, 5) (1996, 3, 1) (1996, 1, 2)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
       [10001],
       [10002]])
y[0]
array([[10003, 10004]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
       [10002],
       [10003]])
y[1]
array([[10004, 10005]])
x, _y, y = prepare_data(data, num_inputs=5, lookback=1, forecast_step=0)

changing input_steps

x, _y, y = prepare_data(data, num_inputs=5, lookback=4, input_steps=2)

print(x.shape, _y.shape, y.shape)
(1993, 4, 5) (1993, 3, 1) (1993, 1, 1)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   2, 2002, 4002, 6002, 8002],
       [   4, 2004, 4004, 6004, 8004],
       [   6, 2006, 4006, 6006, 8006]])
_y[0]
array([[10000],
       [10002],
       [10004]])
y[0]
array([[10006]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   3, 2003, 4003, 6003, 8003],
       [   5, 2005, 4005, 6005, 8005],
       [   7, 2007, 4007, 6007, 8007]])
_y[1]
array([[10001],
       [10003],
       [10005]])
y[1]
array([[10007]])

changing output_steps

x, _y, y = prepare_data(data, num_inputs=5, lookback=4, output_steps=2)

print(x.shape, _y.shape, y.shape)
(1996, 4, 5) (1996, 3, 1) (1996, 1, 1)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
       [10001],
       [10002]])
y[0]
array([[10003]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
       [10002],
       [10003]])
y[1]
array([[10004]])

using known future inputs

x, _y, y = prepare_data(data,
                        num_inputs=5,
                        lookback=4,
                        forecast_step=1,
                        forecast_len=4,
                        known_future_inputs=True)

print(x.shape, _y.shape, y.shape)
(1989, 8, 5) (1989, 7, 1) (1989, 1, 4)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004],
       [   5, 2005, 4005, 6005, 8005],
       [   6, 2006, 4006, 6006, 8006],
       [   7, 2007, 4007, 6007, 8007]])
y[0]
array([[10004, 10005, 10006, 10007]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   2, 2002, 4002, 6002, 8002],
       [   3, 2003, 4003, 6003, 8003],
       [   4, 2004, 4004, 6004, 8004],
       [   5, 2005, 4005, 6005, 8005],
       [   6, 2006, 4006, 6006, 8006],
       [   7, 2007, 4007, 6007, 8007],
       [   8, 2008, 4008, 6008, 8008]])
y[1]
array([[10005, 10006, 10007, 10008]])

using known future inputs with forecast_step=2

x, _y, y = prepare_data(data,
                        num_inputs=5,
                        lookback=4,
                        forecast_len=4,
                        forecast_step=2,
                        input_steps=2,
                        output_steps=2,
                        known_future_inputs=True)

print(x.shape, _y.shape, y.shape)
(1976, 8, 5) (1976, 7, 1) (1976, 1, 4)
x[0]
array([[   0, 2000, 4000, 6000, 8000],
       [   2, 2002, 4002, 6002, 8002],
       [   4, 2004, 4004, 6004, 8004],
       [   6, 2006, 4006, 6006, 8006],
       [   8, 2008, 4008, 6008, 8008],
       [  10, 2010, 4010, 6010, 8010],
       [  12, 2012, 4012, 6012, 8012],
       [  14, 2014, 4014, 6014, 8014]])
y[0]
array([[10008, 10010, 10012, 10014]])
x[1]
array([[   1, 2001, 4001, 6001, 8001],
       [   3, 2003, 4003, 6003, 8003],
       [   5, 2005, 4005, 6005, 8005],
       [   7, 2007, 4007, 6007, 8007],
       [   9, 2009, 4009, 6009, 8009],
       [  11, 2011, 4011, 6011, 8011],
       [  13, 2013, 4013, 6013, 8013],
       [  15, 2015, 4015, 6015, 8015]])
y[1]
array([[10009, 10011, 10013, 10015]])

Handling missing values

Consider the case where missing values are present in the output/target variable/feature

data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
rng = np.random.default_rng(seed=313)  # for reproducibility
# create a random mask for the last column
mask = rng.integers(0, 2, size=data[:, -1].shape).astype(bool)
# introduce NaNs in the last column
data = data.astype(float)
data[mask, -1] = None

print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[    0.  2000.  4000.  6000.  8000. 10000.]
 [    1.  2001.  4001.  6001.  8001.    nan]
 [    2.  2002.  4002.  6002.  8002. 10002.]
 [    3.  2003.  4003.  6003.  8003. 10003.]
 [    4.  2004.  4004.  6004.  8004. 10004.]
 [    5.  2005.  4005.  6005.  8005.    nan]
 [    6.  2006.  4006.  6006.  8006.    nan]
 [    7.  2007.  4007.  6007.  8007. 10007.]
 [    8.  2008.  4008.  6008.  8008.    nan]
 [    9.  2009.  4009.  6009.  8009.    nan]]

 (2000, 6)

[[ 1990.  3990.  5990.  7990.  9990. 11990.]
 [ 1991.  3991.  5991.  7991.  9991. 11991.]
 [ 1992.  3992.  5992.  7992.  9992.    nan]
 [ 1993.  3993.  5993.  7993.  9993.    nan]
 [ 1994.  3994.  5994.  7994.  9994. 11994.]
 [ 1995.  3995.  5995.  7995.  9995. 11995.]
 [ 1996.  3996.  5996.  7996.  9996. 11996.]
 [ 1997.  3997.  5997.  7997.  9997. 11997.]
 [ 1998.  3998.  5998.  7998.  9998. 11998.]
 [ 1999.  3999.  5999.  7999.  9999. 11999.]]
x, _y, y = prepare_data(data, num_inputs=5, lookback=4)
print(x.shape, _y.shape, y.shape)
(1997, 4, 5) (1997, 3, 1) (1997, 1, 1)
y[0]
array([[10003.]])
y[1]
array([[10004.]])
y[2]
array([[nan]])
y[3]
array([[nan]])
y[4], y[5], y[6]
(array([[10007.]]), array([[nan]]), array([[nan]]))

Now we should remove all examples with NaN in the output. This will definitely reduce the number of samples.

nan_idx_y = np.isnan(y).any(axis=(1, 2))

non_nan_idx_y = np.invert(nan_idx_y)

x = x[non_nan_idx_y]
_y = _y[non_nan_idx_y]
y = y[non_nan_idx_y]

print(x.shape, _y.shape, y.shape)
(955, 4, 5) (955, 3, 1) (955, 1, 1)

Now consider the case where missing values in the input features/variables as well

data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
rng = np.random.default_rng(seed=313)  # for reproducibility
# put missing at random positions in the input data
mask = rng.integers(0, 50, size=data[:, :-1].shape).astype(bool)
data = data.astype(float)
data[:, :-1][~mask] = np.nan

print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[    0.  2000.  4000.  6000.  8000. 10000.]
 [    1.  2001.  4001.  6001.  8001. 10001.]
 [    2.  2002.  4002.  6002.  8002. 10002.]
 [    3.  2003.    nan  6003.  8003. 10003.]
 [    4.  2004.  4004.  6004.  8004. 10004.]
 [    5.  2005.  4005.  6005.  8005. 10005.]
 [    6.  2006.  4006.  6006.  8006. 10006.]
 [    7.  2007.  4007.  6007.  8007. 10007.]
 [    8.  2008.  4008.  6008.    nan 10008.]
 [    9.  2009.  4009.  6009.  8009. 10009.]]

 (2000, 6)

[[ 1990.  3990.  5990.  7990.  9990. 11990.]
 [ 1991.  3991.  5991.  7991.  9991. 11991.]
 [ 1992.  3992.    nan  7992.  9992. 11992.]
 [ 1993.  3993.  5993.  7993.  9993. 11993.]
 [ 1994.  3994.  5994.  7994.  9994. 11994.]
 [ 1995.  3995.  5995.  7995.  9995. 11995.]
 [ 1996.  3996.  5996.  7996.  9996. 11996.]
 [ 1997.  3997.  5997.  7997.  9997. 11997.]
 [ 1998.  3998.  5998.  7998.  9998. 11998.]
 [ 1999.  3999.  5999.  7999.  9999. 11999.]]
x, _y, y = prepare_data(data, num_inputs=5, lookback=5)
print(x.shape, _y.shape, y.shape)

x[-3]
(1996, 5, 5) (1996, 4, 1) (1996, 1, 1)

array([[1993., 3993., 5993., 7993., 9993.],
       [1994., 3994., 5994., 7994., 9994.],
       [1995., 3995., 5995., 7995., 9995.],
       [1996., 3996., 5996., 7996., 9996.],
       [1997., 3997., 5997., 7997., 9997.]])
x[-4]
array([[1992., 3992.,   nan, 7992., 9992.],
       [1993., 3993., 5993., 7993., 9993.],
       [1994., 3994., 5994., 7994., 9994.],
       [1995., 3995., 5995., 7995., 9995.],
       [1996., 3996., 5996., 7996., 9996.]])
y[-4]
array([[11996.]])

We should definitely remove all examples with NaN in the input (x)

nan_idx_x = np.isnan(x).any(axis=(1, 2))

non_nan_idx_x = np.invert(nan_idx_x)

x = x[non_nan_idx_x]
_y = _y[non_nan_idx_x]
y = y[non_nan_idx_x]

print(x.shape, _y.shape, y.shape)
(1188, 5, 5) (1188, 4, 1) (1188, 1, 1)

making batches

A batch represents a group of samples/examples (x,y) pairs. The concept of batch is important in deep learning because neural networks are not training at once with all the data but are trained with batches i.e. we divide the whole data into batches then feed the a single batch to neural network , train with it and then feed the next batch.

lookback = 4
num_inputs = 5
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
x, _y, y = prepare_data(data, num_inputs=num_inputs, lookback=lookback)
print(x.shape, _y.shape, y.shape)
(1997, 4, 5) (1997, 3, 1) (1997, 1, 1)

Consider the following example of training an LSTM with a data of of ~2000 samples.

inputs = Input(shape=(lookback, num_inputs))
lstm = LSTM(32)(inputs)
output = Dense(1)(lstm)
model = Model(inputs=inputs, outputs=output)

model.compile(optimizer='adam', loss='mse')
model.fit(x, y, epochs=2, batch_size=128)
Epoch 1/2

 1/16 [>.............................] - ETA: 12s - loss: 121331936.0000
16/16 [==============================] - 1s 1ms/step - loss: 121338976.0000
Epoch 2/2

 1/16 [>.............................] - ETA: 0s - loss: 121454472.0000
16/16 [==============================] - 0s 1ms/step - loss: 121330960.0000

<keras.callbacks.History object at 0x7062be3657c0>

We see that when we trained the model with whole data i.e. 1997 samples, there were 16 batches. This is because we set the batch size equal to 128.

pred = model.predict(x)

using generator

In previous example, we had 1997 samples/examples, and each sample had shape (4, 5). Our x contained all the samples/examples. Since this is a small data therefore we can fit it (all the samples) in memory. But in real world, we may have large datasets with e.g. millions of samples/examples (all of) which cannot fit in memory. This means we can not have x with millions of samples in memory especially when each sample is also large. In such cases, we can use a data generator to load and preprocess the data in batches ourselves. What do we do in such a case? We prepare data only for those many samples/examples which are required at the moment. That means our x at a certain moment does not consist of all the samples/examples but only those that are needed for the current batch.

cols = 6
rows = 200
lookback = 4
num_inputs = 5
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()

x0, _, y0 = prepare_data_sample(data, index=0, lookback=lookback, num_inputs=num_inputs)

x0
array([[  0, 200, 400, 600, 800],
       [  1, 201, 401, 601, 801],
       [  2, 202, 402, 602, 802],
       [  3, 203, 403, 603, 803]])

The function prepare_data_sample returns a single sample/example/data point at a time using the index parameter to specify which sample to return.

y0
array([[1003]])

So if we want to get the second sample/example/data point, we can call the function with index=1

x1, _, y1 = prepare_data_sample(data, index=1, lookback=lookback, num_inputs=num_inputs)

x1
array([[  1, 201, 401, 601, 801],
       [  2, 202, 402, 602, 802],
       [  3, 203, 403, 603, 803],
       [  4, 204, 404, 604, 804]])
y1
array([[1004]])

Similarly, if we want to get the fifth sample/example/data point, we can call the function with index=4

x4, _, y4 = prepare_data_sample(data, index=4, lookback=lookback, num_inputs=num_inputs)

x4
array([[  4, 204, 404, 604, 804],
       [  5, 205, 405, 605, 805],
       [  6, 206, 406, 606, 806],
       [  7, 207, 407, 607, 807]])
y4
array([[1007]])

Now we can create a generator function that yields samples from the dataset.

def sample_generator(data:np.array,
                     lookback, num_inputs, num_outputs=None, input_steps=1, forecast_step=0, forecast_len=1, known_future_inputs=False, output_steps=1):

    for i in range(len(data) - lookback * input_steps + 1 - forecast_step - forecast_len * output_steps):
        x, _, y = prepare_data_sample(data, index=i, lookback=lookback,
                                        num_inputs=num_inputs,
                                        num_outputs=num_outputs,
                                        input_steps=input_steps,
                                        forecast_step=forecast_step,
                                        forecast_len=forecast_len,
                                        known_future_inputs=known_future_inputs,
                                        output_steps=output_steps
                                        )

        # Skip samples with NaNs in x or y
        if np.isnan(x).any() or np.isnan(y).any():
            continue

        yield x, y

gen = sample_generator(data, lookback, num_inputs)

for idx, (x, y) in enumerate(gen):
    print(idx, x.shape, y.shape)
0 (4, 5) (1, 1)
1 (4, 5) (1, 1)
2 (4, 5) (1, 1)
3 (4, 5) (1, 1)
4 (4, 5) (1, 1)
5 (4, 5) (1, 1)
6 (4, 5) (1, 1)
7 (4, 5) (1, 1)
8 (4, 5) (1, 1)
9 (4, 5) (1, 1)
10 (4, 5) (1, 1)
11 (4, 5) (1, 1)
12 (4, 5) (1, 1)
13 (4, 5) (1, 1)
14 (4, 5) (1, 1)
15 (4, 5) (1, 1)
16 (4, 5) (1, 1)
17 (4, 5) (1, 1)
18 (4, 5) (1, 1)
19 (4, 5) (1, 1)
20 (4, 5) (1, 1)
21 (4, 5) (1, 1)
22 (4, 5) (1, 1)
23 (4, 5) (1, 1)
24 (4, 5) (1, 1)
25 (4, 5) (1, 1)
26 (4, 5) (1, 1)
27 (4, 5) (1, 1)
28 (4, 5) (1, 1)
29 (4, 5) (1, 1)
30 (4, 5) (1, 1)
31 (4, 5) (1, 1)
32 (4, 5) (1, 1)
33 (4, 5) (1, 1)
34 (4, 5) (1, 1)
35 (4, 5) (1, 1)
36 (4, 5) (1, 1)
37 (4, 5) (1, 1)
38 (4, 5) (1, 1)
39 (4, 5) (1, 1)
40 (4, 5) (1, 1)
41 (4, 5) (1, 1)
42 (4, 5) (1, 1)
43 (4, 5) (1, 1)
44 (4, 5) (1, 1)
45 (4, 5) (1, 1)
46 (4, 5) (1, 1)
47 (4, 5) (1, 1)
48 (4, 5) (1, 1)
49 (4, 5) (1, 1)
50 (4, 5) (1, 1)
51 (4, 5) (1, 1)
52 (4, 5) (1, 1)
53 (4, 5) (1, 1)
54 (4, 5) (1, 1)
55 (4, 5) (1, 1)
56 (4, 5) (1, 1)
57 (4, 5) (1, 1)
58 (4, 5) (1, 1)
59 (4, 5) (1, 1)
60 (4, 5) (1, 1)
61 (4, 5) (1, 1)
62 (4, 5) (1, 1)
63 (4, 5) (1, 1)
64 (4, 5) (1, 1)
65 (4, 5) (1, 1)
66 (4, 5) (1, 1)
67 (4, 5) (1, 1)
68 (4, 5) (1, 1)
69 (4, 5) (1, 1)
70 (4, 5) (1, 1)
71 (4, 5) (1, 1)
72 (4, 5) (1, 1)
73 (4, 5) (1, 1)
74 (4, 5) (1, 1)
75 (4, 5) (1, 1)
76 (4, 5) (1, 1)
77 (4, 5) (1, 1)
78 (4, 5) (1, 1)
79 (4, 5) (1, 1)
80 (4, 5) (1, 1)
81 (4, 5) (1, 1)
82 (4, 5) (1, 1)
83 (4, 5) (1, 1)
84 (4, 5) (1, 1)
85 (4, 5) (1, 1)
86 (4, 5) (1, 1)
87 (4, 5) (1, 1)
88 (4, 5) (1, 1)
89 (4, 5) (1, 1)
90 (4, 5) (1, 1)
91 (4, 5) (1, 1)
92 (4, 5) (1, 1)
93 (4, 5) (1, 1)
94 (4, 5) (1, 1)
95 (4, 5) (1, 1)
96 (4, 5) (1, 1)
97 (4, 5) (1, 1)
98 (4, 5) (1, 1)
99 (4, 5) (1, 1)
100 (4, 5) (1, 1)
101 (4, 5) (1, 1)
102 (4, 5) (1, 1)
103 (4, 5) (1, 1)
104 (4, 5) (1, 1)
105 (4, 5) (1, 1)
106 (4, 5) (1, 1)
107 (4, 5) (1, 1)
108 (4, 5) (1, 1)
109 (4, 5) (1, 1)
110 (4, 5) (1, 1)
111 (4, 5) (1, 1)
112 (4, 5) (1, 1)
113 (4, 5) (1, 1)
114 (4, 5) (1, 1)
115 (4, 5) (1, 1)
116 (4, 5) (1, 1)
117 (4, 5) (1, 1)
118 (4, 5) (1, 1)
119 (4, 5) (1, 1)
120 (4, 5) (1, 1)
121 (4, 5) (1, 1)
122 (4, 5) (1, 1)
123 (4, 5) (1, 1)
124 (4, 5) (1, 1)
125 (4, 5) (1, 1)
126 (4, 5) (1, 1)
127 (4, 5) (1, 1)
128 (4, 5) (1, 1)
129 (4, 5) (1, 1)
130 (4, 5) (1, 1)
131 (4, 5) (1, 1)
132 (4, 5) (1, 1)
133 (4, 5) (1, 1)
134 (4, 5) (1, 1)
135 (4, 5) (1, 1)
136 (4, 5) (1, 1)
137 (4, 5) (1, 1)
138 (4, 5) (1, 1)
139 (4, 5) (1, 1)
140 (4, 5) (1, 1)
141 (4, 5) (1, 1)
142 (4, 5) (1, 1)
143 (4, 5) (1, 1)
144 (4, 5) (1, 1)
145 (4, 5) (1, 1)
146 (4, 5) (1, 1)
147 (4, 5) (1, 1)
148 (4, 5) (1, 1)
149 (4, 5) (1, 1)
150 (4, 5) (1, 1)
151 (4, 5) (1, 1)
152 (4, 5) (1, 1)
153 (4, 5) (1, 1)
154 (4, 5) (1, 1)
155 (4, 5) (1, 1)
156 (4, 5) (1, 1)
157 (4, 5) (1, 1)
158 (4, 5) (1, 1)
159 (4, 5) (1, 1)
160 (4, 5) (1, 1)
161 (4, 5) (1, 1)
162 (4, 5) (1, 1)
163 (4, 5) (1, 1)
164 (4, 5) (1, 1)
165 (4, 5) (1, 1)
166 (4, 5) (1, 1)
167 (4, 5) (1, 1)
168 (4, 5) (1, 1)
169 (4, 5) (1, 1)
170 (4, 5) (1, 1)
171 (4, 5) (1, 1)
172 (4, 5) (1, 1)
173 (4, 5) (1, 1)
174 (4, 5) (1, 1)
175 (4, 5) (1, 1)
176 (4, 5) (1, 1)
177 (4, 5) (1, 1)
178 (4, 5) (1, 1)
179 (4, 5) (1, 1)
180 (4, 5) (1, 1)
181 (4, 5) (1, 1)
182 (4, 5) (1, 1)
183 (4, 5) (1, 1)
184 (4, 5) (1, 1)
185 (4, 5) (1, 1)
186 (4, 5) (1, 1)
187 (4, 5) (1, 1)
188 (4, 5) (1, 1)
189 (4, 5) (1, 1)
190 (4, 5) (1, 1)
191 (4, 5) (1, 1)
192 (4, 5) (1, 1)
193 (4, 5) (1, 1)
194 (4, 5) (1, 1)
195 (4, 5) (1, 1)

Since we have drawn all the samples from generator and thus generator is exhausted we don’t get anymore samples from it

for idx, (x, y) in enumerate(gen):
    print(idx, x.shape, y.shape)

Now we can prepare tensorflow Dataset using the generator.

output_signature = (
    tf.TensorSpec(shape=(4, 5), dtype=tf.float32),  # shape and dtype for x
    tf.TensorSpec(shape=(1, 1), dtype=tf.float32)   # shape and dtype for y
)

dataset = tf.data.Dataset.from_generator(
    sample_generator,
    args=(data, lookback, num_inputs),
    output_signature=output_signature
)

dataset
<FlatMapDataset shapes: ((4, 5), (1, 1)), types: (tf.float32, tf.float32)>

The dataset is a generator which returns a single sample (x,y) pair at each iteration

for idx, (x,y) in enumerate(dataset):
    print(idx, type(x), type(y), x.shape, y.shape)
0 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
1 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
2 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
3 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
4 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
5 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
6 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
7 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
8 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
9 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
10 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
11 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
12 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
13 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
14 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
15 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
16 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
17 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
18 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
19 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
20 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
21 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
22 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
23 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
24 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
25 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
26 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
27 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
28 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
29 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
30 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
31 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
32 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
33 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
34 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
35 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
36 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
37 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
38 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
39 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
40 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
41 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
42 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
43 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
44 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
45 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
46 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
47 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
48 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
49 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
50 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
51 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
52 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
53 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
54 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
55 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
56 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
57 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
58 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
59 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
60 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
61 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
62 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
63 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
64 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
65 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
66 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
67 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
68 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
69 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
70 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
71 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
72 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
73 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
74 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
75 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
76 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
77 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
78 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
79 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
80 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
81 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
82 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
83 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
84 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
85 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
86 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
87 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
88 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
89 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
90 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
91 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
92 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
93 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
94 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
95 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
96 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
97 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
98 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
99 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
100 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
101 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
102 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
103 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
104 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
105 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
106 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
107 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
108 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
109 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
110 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
111 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
112 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
113 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
114 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
115 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
116 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
117 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
118 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
119 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
120 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
121 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
122 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
123 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
124 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
125 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
126 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
127 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
128 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
129 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
130 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
131 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
132 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
133 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
134 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
135 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
136 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
137 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
138 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
139 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
140 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
141 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
142 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
143 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
144 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
145 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
146 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
147 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
148 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
149 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
150 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
151 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
152 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
153 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
154 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
155 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
156 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
157 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
158 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
159 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
160 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
161 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
162 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
163 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
164 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
165 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
166 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
167 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
168 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
169 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
170 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
171 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
172 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
173 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
174 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
175 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
176 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
177 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
178 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
179 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
180 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
181 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
182 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
183 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
184 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
185 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
186 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
187 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
188 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
189 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
190 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
191 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
192 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
193 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
194 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
195 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)

getting batches instead of single samples (x,y pairs) during iteration

dataset = tf.data.Dataset.from_generator(
    sample_generator,
    args=(data, lookback, num_inputs),
    output_signature=output_signature
)

batch_size = 32
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.data.AUTOTUNE)

dataset
<PrefetchDataset shapes: ((None, 4, 5), (None, 1, 1)), types: (tf.float32, tf.float32)>

Now when we iterate over dataset, we don’t get a single sample/example (x,y) pair at each iteration but we get a batch of samples and the length/size of the batch is determined by the batch_size parameter.

for idx, (x,y) in enumerate(dataset):
    print(idx, type(x), type(y), x.shape, y.shape)
0 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
1 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
2 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
3 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
4 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
5 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
6 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 4, 5) (4, 1, 1)

Let’s use a real world example. We get rainfall-runoff data for several hundred catchments/stations from Columbia.

ds = RainfallRunoff('CAMELS_COL', verbosity=0)

static, dynamic = ds.fetch()

type(dynamic), len(dynamic)
/home/docs/checkouts/readthedocs.org/user_builds/ml-tutorials/envs/latest/lib/python3.9/site-packages/aqua_fetch/rr/utils.py:126: UserWarning: netCDF4 module is not installed. Please install it to save data in netcdf format
  warnings.warn(msg, UserWarning)

(<class 'dict'>, 347)

dynamic is a dictionary with keys as station names and each value is a DataFrame.

dynamic['26247030'].shape
(13971, 6)

get the total length of all DataFrames in dynamic

sum(df.shape[0] for df in dynamic.values())
4079935

get the total length after dropping nan in last column

sum(df.dropna(subset=[df.columns[-1]]).shape[0] for df in dynamic.values())
4079935

Now we make the sample_generator for given number of stations determined by station_ids

def sample_generator(
        station_ids,
        lookback:int,
        num_inputs:int,
        num_outputs=None, input_steps=1, forecast_step=0, forecast_len=1, known_future_inputs=False, output_steps=1):

    for stn in station_ids:

        stn = stn.decode() if isinstance(stn, bytes) else stn

        data = dynamic[stn].values

        for i in range(len(data) - lookback * input_steps + 1 - forecast_step - forecast_len * output_steps):
            x, _, y = prepare_data_sample(data, index=i, lookback=lookback,
                                            num_inputs=num_inputs,
                                            num_outputs=num_outputs,
                                            input_steps=input_steps,
                                            forecast_step=forecast_step,
                                            forecast_len=forecast_len,
                                            known_future_inputs=known_future_inputs,
                                            output_steps=output_steps
                                            )

            # Skip samples with NaNs in x or y
            if np.isnan(x).any() or np.isnan(y).any():
                continue

            yield x, y

lookback = 365
num_inputs = dynamic['26247030'].shape[1] - 1

output_signature = (
    tf.TensorSpec(shape=(lookback, num_inputs), dtype=tf.float32),  # shape and dtype for x
    tf.TensorSpec(shape=(1, 1), dtype=tf.float32)   # shape and dtype for y
)

dataset = tf.data.Dataset.from_generator(
    sample_generator,
    args=(list(dynamic.keys())[0:34], lookback, num_inputs),
    output_signature=output_signature
)

dataset
<FlatMapDataset shapes: ((365, 5), (1, 1)), types: (tf.float32, tf.float32)>

Now we iterate over the dataset and measure the time taken to get all the samples from 34 stations. We chose 34 because it is a manageable number for our example.

start = time.time()
for idx, (x,y) in enumerate(dataset):
    pass
print(round(time.time() - start, 2), 'seconds taken')
print("index of last sample: ", idx)
print(x.shape, y.shape)
61.78 seconds taken
index of last sample:  397807
(365, 5) (1, 1)

getting batches instead of single samples (x,y pairs) during iteration

dataset = tf.data.Dataset.from_generator(
    sample_generator,
    args=(list(dynamic.keys())[0:34], lookback, num_inputs),
    output_signature=output_signature
)

batch_size = 1024
dataset = dataset.take(1_000_000)  # Limit to 1 million samples
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.data.AUTOTUNE)

dataset
<PrefetchDataset shapes: ((None, 365, 5), (None, 1, 1)), types: (tf.float32, tf.float32)>

Now when we iterate over dataset, we don’t get a single sample/example (x,y) pair at each iteration but we get a batch of samples and the length/size of the batch is determined by the batch_size parameter.

start = time.time()
for idx, (x,y) in enumerate(dataset):
    pass
print(round(time.time() - start, 2), 'seconds taken')
print("index of last batch: ", idx)
print(x.shape, y.shape)
34.88 seconds taken
index of last batch:  388
(496, 365, 5) (496, 1, 1)

using tf.keras utility function which highly optimized

data = pd.concat([val for val in list(dynamic.values())[0:34]], axis=0)
print(data.shape)
dataset = tf.keras.utils.timeseries_dataset_from_array(
    data.iloc[:, 0:-1].values,
    targets=data.iloc[:, -1].values,
    sequence_length=lookback,
    batch_size=batch_size
)

dataset
(410218, 6)

<BatchDataset shapes: ((None, None, 5), (None,)), types: (tf.float64, tf.float64)>
start = time.time()
for idx, (x,y) in enumerate(dataset):
    pass
print(round(time.time() - start, 2), 'seconds taken')

print("index of last batch: ", idx)
print(x.shape, y.shape)
22.67 seconds taken
index of last batch:  400
(254, 365, 5) (254,)

Total running time of the script: (2 minutes 17.893 seconds)

Gallery generated by Sphinx-Gallery