Note
Go to the end to download the full example code. or to run this example in your browser via Binder
Data Preparation for Time Series Prediction
This example demonstrates how to prepare data for time series prediction especially for deep learning models/algorithms like LSTM/RNN.
import time
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model
from aqua_fetch import RainfallRunoff
print("tf: ", tf.__version__)
print("np: ", np.__version__)
print('pd: ', pd.__version__)
from utils import prepare_data, prepare_data_sample
tf: 2.7.0
np: 1.21.6
pd: 2.0.3
First we create a simple dataset with 2000 rows and 1 columns i.e. a univariate time series with no covariates.
rows = 2000
cols = 1
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
Below we print the first 10 rows, the shape of the dataset, and the last 10 rows to give an overview of the data structure.
print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
(2000, 1)
[[1990]
[1991]
[1992]
[1993]
[1994]
[1995]
[1996]
[1997]
[1998]
[1999]]
x, _y, y = prepare_data(data, num_inputs=1, num_outputs=1, lookback=4)
print(x.shape, _y.shape, y.shape)
(1997, 4, 1) (1997, 3, 1) (1997, 1, 1)
Checking the first sample/example/data point
x[0]
array([[0],
[1],
[2],
[3]])
_y[0]
array([[0],
[1],
[2]])
y[0]
array([[3]])
Checking the second sample/example/data point
x[1]
array([[1],
[2],
[3],
[4]])
_y[1]
array([[1],
[2],
[3]])
y[1]
array([[4]])
Now we create another dataset with 2000 rows but with 6 columns i.e. multivariate timeseries. Each column can represent a different feature or variable in the time series data. The dataset is filled with sequential integers for demonstration purposes.
rows = 2000
cols = 6
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[ 0 2000 4000 6000 8000 10000]
[ 1 2001 4001 6001 8001 10001]
[ 2 2002 4002 6002 8002 10002]
[ 3 2003 4003 6003 8003 10003]
[ 4 2004 4004 6004 8004 10004]
[ 5 2005 4005 6005 8005 10005]
[ 6 2006 4006 6006 8006 10006]
[ 7 2007 4007 6007 8007 10007]
[ 8 2008 4008 6008 8008 10008]
[ 9 2009 4009 6009 8009 10009]]
(2000, 6)
[[ 1990 3990 5990 7990 9990 11990]
[ 1991 3991 5991 7991 9991 11991]
[ 1992 3992 5992 7992 9992 11992]
[ 1993 3993 5993 7993 9993 11993]
[ 1994 3994 5994 7994 9994 11994]
[ 1995 3995 5995 7995 9995 11995]
[ 1996 3996 5996 7996 9996 11996]
[ 1997 3997 5997 7997 9997 11997]
[ 1998 3998 5998 7998 9998 11998]
[ 1999 3999 5999 7999 9999 11999]]
If this were a multivariate time series with no covariates then we would use the same approach as before i.e. set the num_inputs equal to that of num_outputs.
x, _y, y = prepare_data(data, num_inputs=6, num_outputs=6, lookback=4)
print(x.shape, _y.shape, y.shape)
(1997, 4, 6) (1997, 3, 6) (1997, 6, 1)
Checking the first sample/example/data point
x[0]
array([[ 0, 2000, 4000, 6000, 8000, 10000],
[ 1, 2001, 4001, 6001, 8001, 10001],
[ 2, 2002, 4002, 6002, 8002, 10002],
[ 3, 2003, 4003, 6003, 8003, 10003]])
_y[0]
array([[ 0, 2000, 4000, 6000, 8000, 10000],
[ 1, 2001, 4001, 6001, 8001, 10001],
[ 2, 2002, 4002, 6002, 8002, 10002]])
y[0]
array([[ 3],
[ 2003],
[ 4003],
[ 6003],
[ 8003],
[10003]])
Checking the second sample/example/data point
x[1]
array([[ 1, 2001, 4001, 6001, 8001, 10001],
[ 2, 2002, 4002, 6002, 8002, 10002],
[ 3, 2003, 4003, 6003, 8003, 10003],
[ 4, 2004, 4004, 6004, 8004, 10004]])
_y[1]
array([[ 1, 2001, 4001, 6001, 8001, 10001],
[ 2, 2002, 4002, 6002, 8002, 10002],
[ 3, 2003, 4003, 6003, 8003, 10003]])
y[1]
array([[ 4],
[ 2004],
[ 4004],
[ 6004],
[ 8004],
[10004]])
However, if this were a multivariate time series with covariates, i.e. one timeseries column is our target variable and the others are input features, we would need to adjust the data preparation accordingly.
x, _y, y = prepare_data(data, num_inputs=5, lookback=4)
print(x.shape, _y.shape, y.shape)
(1997, 4, 5) (1997, 3, 1) (1997, 1, 1)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
[10001],
[10002]])
y[0]
array([[10003]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
[10002],
[10003]])
y[1]
array([[10004]])
Consider the case where number of input features/timeseries are 4 and output features/timeseries are 2.
x, _y, y = prepare_data(data, num_inputs=4, lookback=4)
print(x.shape, _y.shape, y.shape)
(1997, 4, 4) (1997, 3, 2) (1997, 2, 1)
x[0]
array([[ 0, 2000, 4000, 6000],
[ 1, 2001, 4001, 6001],
[ 2, 2002, 4002, 6002],
[ 3, 2003, 4003, 6003]])
_y[0]
array([[ 8000, 10000],
[ 8001, 10001],
[ 8002, 10002]])
y[0]
array([[ 8003],
[10003]])
x[1]
array([[ 1, 2001, 4001, 6001],
[ 2, 2002, 4002, 6002],
[ 3, 2003, 4003, 6003],
[ 4, 2004, 4004, 6004]])
_y[1]
array([[ 8001, 10001],
[ 8002, 10002],
[ 8003, 10003]])
y[1]
array([[ 8004],
[10004]])
nowcasting vs forecasting
If forecast_step is > 0, it means we want to predict in future. It reflects that we are predicting at timestep t = t+1 which effectively means that we feed input data at timestep t and predict the target at timestep t+1.
x, _y, y = prepare_data(data, num_inputs=5, lookback=4, forecast_step=1)
print(x.shape, _y.shape, y.shape)
(1996, 4, 5) (1996, 3, 1) (1996, 1, 1)
First sample
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
[10001],
[10002]])
y[0]
array([[10004]])
Second sample
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
[10002],
[10003]])
y[1]
array([[10005]])
if we want to forecast multiple timesteps in future
x, _y, y = prepare_data(data, num_inputs=5, lookback=4, forecast_step=1, forecast_len=2)
print(x.shape, _y.shape, y.shape)
(1995, 4, 5) (1995, 3, 1) (1995, 1, 2)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
[10001],
[10002]])
y[0]
array([[10004, 10005]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
[10002],
[10003]])
y[1]
array([[10005, 10006]])
If forecast_step is 0, that means make prediction at t=0 which means we are using input at current timestep to predict the output at current timestep.
x, _y, y = prepare_data(data, num_inputs=5, lookback=4, forecast_step=0, forecast_len=2)
print(x.shape, _y.shape, y.shape)
(1996, 4, 5) (1996, 3, 1) (1996, 1, 2)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
[10001],
[10002]])
y[0]
array([[10003, 10004]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
[10002],
[10003]])
y[1]
array([[10004, 10005]])
x, _y, y = prepare_data(data, num_inputs=5, lookback=1, forecast_step=0)
changing input_steps
x, _y, y = prepare_data(data, num_inputs=5, lookback=4, input_steps=2)
print(x.shape, _y.shape, y.shape)
(1993, 4, 5) (1993, 3, 1) (1993, 1, 1)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 2, 2002, 4002, 6002, 8002],
[ 4, 2004, 4004, 6004, 8004],
[ 6, 2006, 4006, 6006, 8006]])
_y[0]
array([[10000],
[10002],
[10004]])
y[0]
array([[10006]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 3, 2003, 4003, 6003, 8003],
[ 5, 2005, 4005, 6005, 8005],
[ 7, 2007, 4007, 6007, 8007]])
_y[1]
array([[10001],
[10003],
[10005]])
y[1]
array([[10007]])
changing output_steps
x, _y, y = prepare_data(data, num_inputs=5, lookback=4, output_steps=2)
print(x.shape, _y.shape, y.shape)
(1996, 4, 5) (1996, 3, 1) (1996, 1, 1)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003]])
_y[0]
array([[10000],
[10001],
[10002]])
y[0]
array([[10003]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004]])
_y[1]
array([[10001],
[10002],
[10003]])
y[1]
array([[10004]])
using known future inputs
x, _y, y = prepare_data(data,
num_inputs=5,
lookback=4,
forecast_step=1,
forecast_len=4,
known_future_inputs=True)
print(x.shape, _y.shape, y.shape)
(1989, 8, 5) (1989, 7, 1) (1989, 1, 4)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004],
[ 5, 2005, 4005, 6005, 8005],
[ 6, 2006, 4006, 6006, 8006],
[ 7, 2007, 4007, 6007, 8007]])
y[0]
array([[10004, 10005, 10006, 10007]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 2, 2002, 4002, 6002, 8002],
[ 3, 2003, 4003, 6003, 8003],
[ 4, 2004, 4004, 6004, 8004],
[ 5, 2005, 4005, 6005, 8005],
[ 6, 2006, 4006, 6006, 8006],
[ 7, 2007, 4007, 6007, 8007],
[ 8, 2008, 4008, 6008, 8008]])
y[1]
array([[10005, 10006, 10007, 10008]])
using known future inputs with forecast_step=2
x, _y, y = prepare_data(data,
num_inputs=5,
lookback=4,
forecast_len=4,
forecast_step=2,
input_steps=2,
output_steps=2,
known_future_inputs=True)
print(x.shape, _y.shape, y.shape)
(1976, 8, 5) (1976, 7, 1) (1976, 1, 4)
x[0]
array([[ 0, 2000, 4000, 6000, 8000],
[ 2, 2002, 4002, 6002, 8002],
[ 4, 2004, 4004, 6004, 8004],
[ 6, 2006, 4006, 6006, 8006],
[ 8, 2008, 4008, 6008, 8008],
[ 10, 2010, 4010, 6010, 8010],
[ 12, 2012, 4012, 6012, 8012],
[ 14, 2014, 4014, 6014, 8014]])
y[0]
array([[10008, 10010, 10012, 10014]])
x[1]
array([[ 1, 2001, 4001, 6001, 8001],
[ 3, 2003, 4003, 6003, 8003],
[ 5, 2005, 4005, 6005, 8005],
[ 7, 2007, 4007, 6007, 8007],
[ 9, 2009, 4009, 6009, 8009],
[ 11, 2011, 4011, 6011, 8011],
[ 13, 2013, 4013, 6013, 8013],
[ 15, 2015, 4015, 6015, 8015]])
y[1]
array([[10009, 10011, 10013, 10015]])
Handling missing values
Consider the case where missing values are present in the output/target variable/feature
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
rng = np.random.default_rng(seed=313) # for reproducibility
# create a random mask for the last column
mask = rng.integers(0, 2, size=data[:, -1].shape).astype(bool)
# introduce NaNs in the last column
data = data.astype(float)
data[mask, -1] = None
print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[ 0. 2000. 4000. 6000. 8000. 10000.]
[ 1. 2001. 4001. 6001. 8001. nan]
[ 2. 2002. 4002. 6002. 8002. 10002.]
[ 3. 2003. 4003. 6003. 8003. 10003.]
[ 4. 2004. 4004. 6004. 8004. 10004.]
[ 5. 2005. 4005. 6005. 8005. nan]
[ 6. 2006. 4006. 6006. 8006. nan]
[ 7. 2007. 4007. 6007. 8007. 10007.]
[ 8. 2008. 4008. 6008. 8008. nan]
[ 9. 2009. 4009. 6009. 8009. nan]]
(2000, 6)
[[ 1990. 3990. 5990. 7990. 9990. 11990.]
[ 1991. 3991. 5991. 7991. 9991. 11991.]
[ 1992. 3992. 5992. 7992. 9992. nan]
[ 1993. 3993. 5993. 7993. 9993. nan]
[ 1994. 3994. 5994. 7994. 9994. 11994.]
[ 1995. 3995. 5995. 7995. 9995. 11995.]
[ 1996. 3996. 5996. 7996. 9996. 11996.]
[ 1997. 3997. 5997. 7997. 9997. 11997.]
[ 1998. 3998. 5998. 7998. 9998. 11998.]
[ 1999. 3999. 5999. 7999. 9999. 11999.]]
x, _y, y = prepare_data(data, num_inputs=5, lookback=4)
print(x.shape, _y.shape, y.shape)
(1997, 4, 5) (1997, 3, 1) (1997, 1, 1)
y[0]
array([[10003.]])
y[1]
array([[10004.]])
y[2]
array([[nan]])
y[3]
array([[nan]])
y[4], y[5], y[6]
(array([[10007.]]), array([[nan]]), array([[nan]]))
Now we should remove all examples with NaN in the output. This will definitely reduce the number of samples.
nan_idx_y = np.isnan(y).any(axis=(1, 2))
non_nan_idx_y = np.invert(nan_idx_y)
x = x[non_nan_idx_y]
_y = _y[non_nan_idx_y]
y = y[non_nan_idx_y]
print(x.shape, _y.shape, y.shape)
(955, 4, 5) (955, 3, 1) (955, 1, 1)
Now consider the case where missing values in the input features/variables as well
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
rng = np.random.default_rng(seed=313) # for reproducibility
# put missing at random positions in the input data
mask = rng.integers(0, 50, size=data[:, :-1].shape).astype(bool)
data = data.astype(float)
data[:, :-1][~mask] = np.nan
print(data[0:10])
print('\n {} \n'.format(data.shape))
print(data[-10:])
[[ 0. 2000. 4000. 6000. 8000. 10000.]
[ 1. 2001. 4001. 6001. 8001. 10001.]
[ 2. 2002. 4002. 6002. 8002. 10002.]
[ 3. 2003. nan 6003. 8003. 10003.]
[ 4. 2004. 4004. 6004. 8004. 10004.]
[ 5. 2005. 4005. 6005. 8005. 10005.]
[ 6. 2006. 4006. 6006. 8006. 10006.]
[ 7. 2007. 4007. 6007. 8007. 10007.]
[ 8. 2008. 4008. 6008. nan 10008.]
[ 9. 2009. 4009. 6009. 8009. 10009.]]
(2000, 6)
[[ 1990. 3990. 5990. 7990. 9990. 11990.]
[ 1991. 3991. 5991. 7991. 9991. 11991.]
[ 1992. 3992. nan 7992. 9992. 11992.]
[ 1993. 3993. 5993. 7993. 9993. 11993.]
[ 1994. 3994. 5994. 7994. 9994. 11994.]
[ 1995. 3995. 5995. 7995. 9995. 11995.]
[ 1996. 3996. 5996. 7996. 9996. 11996.]
[ 1997. 3997. 5997. 7997. 9997. 11997.]
[ 1998. 3998. 5998. 7998. 9998. 11998.]
[ 1999. 3999. 5999. 7999. 9999. 11999.]]
x, _y, y = prepare_data(data, num_inputs=5, lookback=5)
print(x.shape, _y.shape, y.shape)
x[-3]
(1996, 5, 5) (1996, 4, 1) (1996, 1, 1)
array([[1993., 3993., 5993., 7993., 9993.],
[1994., 3994., 5994., 7994., 9994.],
[1995., 3995., 5995., 7995., 9995.],
[1996., 3996., 5996., 7996., 9996.],
[1997., 3997., 5997., 7997., 9997.]])
x[-4]
array([[1992., 3992., nan, 7992., 9992.],
[1993., 3993., 5993., 7993., 9993.],
[1994., 3994., 5994., 7994., 9994.],
[1995., 3995., 5995., 7995., 9995.],
[1996., 3996., 5996., 7996., 9996.]])
y[-4]
array([[11996.]])
We should definitely remove all examples with NaN in the input (x)
nan_idx_x = np.isnan(x).any(axis=(1, 2))
non_nan_idx_x = np.invert(nan_idx_x)
x = x[non_nan_idx_x]
_y = _y[non_nan_idx_x]
y = y[non_nan_idx_x]
print(x.shape, _y.shape, y.shape)
(1188, 5, 5) (1188, 4, 1) (1188, 1, 1)
making batches
A batch represents a group of samples/examples (x,y) pairs. The concept of batch is important in deep learning because neural networks are not training at once with all the data but are trained with batches i.e. we divide the whole data into batches then feed the a single batch to neural network , train with it and then feed the next batch.
lookback = 4
num_inputs = 5
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
x, _y, y = prepare_data(data, num_inputs=num_inputs, lookback=lookback)
print(x.shape, _y.shape, y.shape)
(1997, 4, 5) (1997, 3, 1) (1997, 1, 1)
Consider the following example of training an LSTM with a data of of ~2000 samples.
inputs = Input(shape=(lookback, num_inputs))
lstm = LSTM(32)(inputs)
output = Dense(1)(lstm)
model = Model(inputs=inputs, outputs=output)
model.compile(optimizer='adam', loss='mse')
model.fit(x, y, epochs=2, batch_size=128)
Epoch 1/2
1/16 [>.............................] - ETA: 12s - loss: 121331936.0000
16/16 [==============================] - 1s 1ms/step - loss: 121338976.0000
Epoch 2/2
1/16 [>.............................] - ETA: 0s - loss: 121454472.0000
16/16 [==============================] - 0s 1ms/step - loss: 121330960.0000
<keras.callbacks.History object at 0x7062be3657c0>
We see that when we trained the model with whole data i.e. 1997 samples, there were 16 batches. This is because we set the batch size equal to 128.
pred = model.predict(x)
using generator
In previous example, we had 1997 samples/examples, and each sample had shape (4, 5).
Our x contained all the samples/examples. Since this is a small data therefore we can fit it (all the samples) in memory.
But in real world, we may have large datasets with e.g. millions of samples/examples
(all of) which cannot fit in memory. This means we can not have x with millions of samples in memory especially
when each sample is also large.
In such cases, we can use a data generator to load and preprocess the data in batches ourselves.
What do we do in such a case? We prepare data only for those many samples/examples
which are required at the moment. That means our x at a certain moment does not
consist of all the samples/examples but only those that are needed for the current
batch.
cols = 6
rows = 200
lookback = 4
num_inputs = 5
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
x0, _, y0 = prepare_data_sample(data, index=0, lookback=lookback, num_inputs=num_inputs)
x0
array([[ 0, 200, 400, 600, 800],
[ 1, 201, 401, 601, 801],
[ 2, 202, 402, 602, 802],
[ 3, 203, 403, 603, 803]])
The function prepare_data_sample returns a single sample/example/data point at a time using the index parameter to specify which sample to return.
y0
array([[1003]])
So if we want to get the second sample/example/data point, we can call the function with index=1
x1, _, y1 = prepare_data_sample(data, index=1, lookback=lookback, num_inputs=num_inputs)
x1
array([[ 1, 201, 401, 601, 801],
[ 2, 202, 402, 602, 802],
[ 3, 203, 403, 603, 803],
[ 4, 204, 404, 604, 804]])
y1
array([[1004]])
Similarly, if we want to get the fifth sample/example/data point, we can call the function with index=4
x4, _, y4 = prepare_data_sample(data, index=4, lookback=lookback, num_inputs=num_inputs)
x4
array([[ 4, 204, 404, 604, 804],
[ 5, 205, 405, 605, 805],
[ 6, 206, 406, 606, 806],
[ 7, 207, 407, 607, 807]])
y4
array([[1007]])
Now we can create a generator function that yields samples from the dataset.
def sample_generator(data:np.array,
lookback, num_inputs, num_outputs=None, input_steps=1, forecast_step=0, forecast_len=1, known_future_inputs=False, output_steps=1):
for i in range(len(data) - lookback * input_steps + 1 - forecast_step - forecast_len * output_steps):
x, _, y = prepare_data_sample(data, index=i, lookback=lookback,
num_inputs=num_inputs,
num_outputs=num_outputs,
input_steps=input_steps,
forecast_step=forecast_step,
forecast_len=forecast_len,
known_future_inputs=known_future_inputs,
output_steps=output_steps
)
# Skip samples with NaNs in x or y
if np.isnan(x).any() or np.isnan(y).any():
continue
yield x, y
gen = sample_generator(data, lookback, num_inputs)
for idx, (x, y) in enumerate(gen):
print(idx, x.shape, y.shape)
0 (4, 5) (1, 1)
1 (4, 5) (1, 1)
2 (4, 5) (1, 1)
3 (4, 5) (1, 1)
4 (4, 5) (1, 1)
5 (4, 5) (1, 1)
6 (4, 5) (1, 1)
7 (4, 5) (1, 1)
8 (4, 5) (1, 1)
9 (4, 5) (1, 1)
10 (4, 5) (1, 1)
11 (4, 5) (1, 1)
12 (4, 5) (1, 1)
13 (4, 5) (1, 1)
14 (4, 5) (1, 1)
15 (4, 5) (1, 1)
16 (4, 5) (1, 1)
17 (4, 5) (1, 1)
18 (4, 5) (1, 1)
19 (4, 5) (1, 1)
20 (4, 5) (1, 1)
21 (4, 5) (1, 1)
22 (4, 5) (1, 1)
23 (4, 5) (1, 1)
24 (4, 5) (1, 1)
25 (4, 5) (1, 1)
26 (4, 5) (1, 1)
27 (4, 5) (1, 1)
28 (4, 5) (1, 1)
29 (4, 5) (1, 1)
30 (4, 5) (1, 1)
31 (4, 5) (1, 1)
32 (4, 5) (1, 1)
33 (4, 5) (1, 1)
34 (4, 5) (1, 1)
35 (4, 5) (1, 1)
36 (4, 5) (1, 1)
37 (4, 5) (1, 1)
38 (4, 5) (1, 1)
39 (4, 5) (1, 1)
40 (4, 5) (1, 1)
41 (4, 5) (1, 1)
42 (4, 5) (1, 1)
43 (4, 5) (1, 1)
44 (4, 5) (1, 1)
45 (4, 5) (1, 1)
46 (4, 5) (1, 1)
47 (4, 5) (1, 1)
48 (4, 5) (1, 1)
49 (4, 5) (1, 1)
50 (4, 5) (1, 1)
51 (4, 5) (1, 1)
52 (4, 5) (1, 1)
53 (4, 5) (1, 1)
54 (4, 5) (1, 1)
55 (4, 5) (1, 1)
56 (4, 5) (1, 1)
57 (4, 5) (1, 1)
58 (4, 5) (1, 1)
59 (4, 5) (1, 1)
60 (4, 5) (1, 1)
61 (4, 5) (1, 1)
62 (4, 5) (1, 1)
63 (4, 5) (1, 1)
64 (4, 5) (1, 1)
65 (4, 5) (1, 1)
66 (4, 5) (1, 1)
67 (4, 5) (1, 1)
68 (4, 5) (1, 1)
69 (4, 5) (1, 1)
70 (4, 5) (1, 1)
71 (4, 5) (1, 1)
72 (4, 5) (1, 1)
73 (4, 5) (1, 1)
74 (4, 5) (1, 1)
75 (4, 5) (1, 1)
76 (4, 5) (1, 1)
77 (4, 5) (1, 1)
78 (4, 5) (1, 1)
79 (4, 5) (1, 1)
80 (4, 5) (1, 1)
81 (4, 5) (1, 1)
82 (4, 5) (1, 1)
83 (4, 5) (1, 1)
84 (4, 5) (1, 1)
85 (4, 5) (1, 1)
86 (4, 5) (1, 1)
87 (4, 5) (1, 1)
88 (4, 5) (1, 1)
89 (4, 5) (1, 1)
90 (4, 5) (1, 1)
91 (4, 5) (1, 1)
92 (4, 5) (1, 1)
93 (4, 5) (1, 1)
94 (4, 5) (1, 1)
95 (4, 5) (1, 1)
96 (4, 5) (1, 1)
97 (4, 5) (1, 1)
98 (4, 5) (1, 1)
99 (4, 5) (1, 1)
100 (4, 5) (1, 1)
101 (4, 5) (1, 1)
102 (4, 5) (1, 1)
103 (4, 5) (1, 1)
104 (4, 5) (1, 1)
105 (4, 5) (1, 1)
106 (4, 5) (1, 1)
107 (4, 5) (1, 1)
108 (4, 5) (1, 1)
109 (4, 5) (1, 1)
110 (4, 5) (1, 1)
111 (4, 5) (1, 1)
112 (4, 5) (1, 1)
113 (4, 5) (1, 1)
114 (4, 5) (1, 1)
115 (4, 5) (1, 1)
116 (4, 5) (1, 1)
117 (4, 5) (1, 1)
118 (4, 5) (1, 1)
119 (4, 5) (1, 1)
120 (4, 5) (1, 1)
121 (4, 5) (1, 1)
122 (4, 5) (1, 1)
123 (4, 5) (1, 1)
124 (4, 5) (1, 1)
125 (4, 5) (1, 1)
126 (4, 5) (1, 1)
127 (4, 5) (1, 1)
128 (4, 5) (1, 1)
129 (4, 5) (1, 1)
130 (4, 5) (1, 1)
131 (4, 5) (1, 1)
132 (4, 5) (1, 1)
133 (4, 5) (1, 1)
134 (4, 5) (1, 1)
135 (4, 5) (1, 1)
136 (4, 5) (1, 1)
137 (4, 5) (1, 1)
138 (4, 5) (1, 1)
139 (4, 5) (1, 1)
140 (4, 5) (1, 1)
141 (4, 5) (1, 1)
142 (4, 5) (1, 1)
143 (4, 5) (1, 1)
144 (4, 5) (1, 1)
145 (4, 5) (1, 1)
146 (4, 5) (1, 1)
147 (4, 5) (1, 1)
148 (4, 5) (1, 1)
149 (4, 5) (1, 1)
150 (4, 5) (1, 1)
151 (4, 5) (1, 1)
152 (4, 5) (1, 1)
153 (4, 5) (1, 1)
154 (4, 5) (1, 1)
155 (4, 5) (1, 1)
156 (4, 5) (1, 1)
157 (4, 5) (1, 1)
158 (4, 5) (1, 1)
159 (4, 5) (1, 1)
160 (4, 5) (1, 1)
161 (4, 5) (1, 1)
162 (4, 5) (1, 1)
163 (4, 5) (1, 1)
164 (4, 5) (1, 1)
165 (4, 5) (1, 1)
166 (4, 5) (1, 1)
167 (4, 5) (1, 1)
168 (4, 5) (1, 1)
169 (4, 5) (1, 1)
170 (4, 5) (1, 1)
171 (4, 5) (1, 1)
172 (4, 5) (1, 1)
173 (4, 5) (1, 1)
174 (4, 5) (1, 1)
175 (4, 5) (1, 1)
176 (4, 5) (1, 1)
177 (4, 5) (1, 1)
178 (4, 5) (1, 1)
179 (4, 5) (1, 1)
180 (4, 5) (1, 1)
181 (4, 5) (1, 1)
182 (4, 5) (1, 1)
183 (4, 5) (1, 1)
184 (4, 5) (1, 1)
185 (4, 5) (1, 1)
186 (4, 5) (1, 1)
187 (4, 5) (1, 1)
188 (4, 5) (1, 1)
189 (4, 5) (1, 1)
190 (4, 5) (1, 1)
191 (4, 5) (1, 1)
192 (4, 5) (1, 1)
193 (4, 5) (1, 1)
194 (4, 5) (1, 1)
195 (4, 5) (1, 1)
Since we have drawn all the samples from generator and thus generator is exhausted we don’t get anymore samples from it
for idx, (x, y) in enumerate(gen):
print(idx, x.shape, y.shape)
Now we can prepare tensorflow Dataset using the generator.
output_signature = (
tf.TensorSpec(shape=(4, 5), dtype=tf.float32), # shape and dtype for x
tf.TensorSpec(shape=(1, 1), dtype=tf.float32) # shape and dtype for y
)
dataset = tf.data.Dataset.from_generator(
sample_generator,
args=(data, lookback, num_inputs),
output_signature=output_signature
)
dataset
<FlatMapDataset shapes: ((4, 5), (1, 1)), types: (tf.float32, tf.float32)>
The dataset is a generator which returns a single sample (x,y) pair at each iteration
for idx, (x,y) in enumerate(dataset):
print(idx, type(x), type(y), x.shape, y.shape)
0 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
1 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
2 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
3 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
4 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
5 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
6 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
7 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
8 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
9 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
10 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
11 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
12 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
13 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
14 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
15 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
16 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
17 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
18 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
19 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
20 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
21 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
22 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
23 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
24 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
25 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
26 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
27 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
28 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
29 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
30 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
31 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
32 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
33 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
34 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
35 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
36 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
37 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
38 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
39 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
40 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
41 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
42 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
43 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
44 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
45 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
46 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
47 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
48 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
49 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
50 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
51 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
52 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
53 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
54 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
55 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
56 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
57 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
58 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
59 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
60 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
61 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
62 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
63 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
64 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
65 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
66 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
67 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
68 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
69 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
70 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
71 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
72 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
73 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
74 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
75 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
76 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
77 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
78 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
79 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
80 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
81 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
82 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
83 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
84 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
85 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
86 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
87 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
88 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
89 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
90 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
91 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
92 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
93 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
94 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
95 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
96 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
97 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
98 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
99 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
100 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
101 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
102 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
103 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
104 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
105 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
106 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
107 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
108 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
109 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
110 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
111 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
112 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
113 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
114 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
115 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
116 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
117 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
118 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
119 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
120 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
121 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
122 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
123 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
124 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
125 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
126 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
127 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
128 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
129 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
130 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
131 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
132 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
133 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
134 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
135 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
136 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
137 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
138 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
139 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
140 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
141 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
142 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
143 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
144 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
145 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
146 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
147 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
148 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
149 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
150 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
151 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
152 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
153 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
154 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
155 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
156 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
157 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
158 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
159 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
160 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
161 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
162 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
163 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
164 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
165 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
166 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
167 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
168 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
169 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
170 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
171 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
172 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
173 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
174 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
175 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
176 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
177 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
178 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
179 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
180 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
181 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
182 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
183 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
184 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
185 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
186 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
187 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
188 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
189 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
190 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
191 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
192 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
193 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
194 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
195 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 5) (1, 1)
getting batches instead of single samples (x,y pairs) during iteration
dataset = tf.data.Dataset.from_generator(
sample_generator,
args=(data, lookback, num_inputs),
output_signature=output_signature
)
batch_size = 32
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
dataset
<PrefetchDataset shapes: ((None, 4, 5), (None, 1, 1)), types: (tf.float32, tf.float32)>
Now when we iterate over dataset, we don’t get a single sample/example (x,y) pair at each iteration but we get a batch of samples and the length/size of the batch is determined by the batch_size parameter.
for idx, (x,y) in enumerate(dataset):
print(idx, type(x), type(y), x.shape, y.shape)
0 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
1 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
2 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
3 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
4 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
5 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (32, 4, 5) (32, 1, 1)
6 <class 'tensorflow.python.framework.ops.EagerTensor'> <class 'tensorflow.python.framework.ops.EagerTensor'> (4, 4, 5) (4, 1, 1)
Let’s use a real world example. We get rainfall-runoff data for several hundred catchments/stations from Columbia.
ds = RainfallRunoff('CAMELS_COL', verbosity=0)
static, dynamic = ds.fetch()
type(dynamic), len(dynamic)
/home/docs/checkouts/readthedocs.org/user_builds/ml-tutorials/envs/latest/lib/python3.9/site-packages/aqua_fetch/rr/utils.py:126: UserWarning: netCDF4 module is not installed. Please install it to save data in netcdf format
warnings.warn(msg, UserWarning)
(<class 'dict'>, 347)
dynamic is a dictionary with keys as station names and each value is a DataFrame.
dynamic['26247030'].shape
(13971, 6)
get the total length of all DataFrames in dynamic
sum(df.shape[0] for df in dynamic.values())
4079935
get the total length after dropping nan in last column
sum(df.dropna(subset=[df.columns[-1]]).shape[0] for df in dynamic.values())
4079935
Now we make the sample_generator for given number of stations determined by station_ids
def sample_generator(
station_ids,
lookback:int,
num_inputs:int,
num_outputs=None, input_steps=1, forecast_step=0, forecast_len=1, known_future_inputs=False, output_steps=1):
for stn in station_ids:
stn = stn.decode() if isinstance(stn, bytes) else stn
data = dynamic[stn].values
for i in range(len(data) - lookback * input_steps + 1 - forecast_step - forecast_len * output_steps):
x, _, y = prepare_data_sample(data, index=i, lookback=lookback,
num_inputs=num_inputs,
num_outputs=num_outputs,
input_steps=input_steps,
forecast_step=forecast_step,
forecast_len=forecast_len,
known_future_inputs=known_future_inputs,
output_steps=output_steps
)
# Skip samples with NaNs in x or y
if np.isnan(x).any() or np.isnan(y).any():
continue
yield x, y
lookback = 365
num_inputs = dynamic['26247030'].shape[1] - 1
output_signature = (
tf.TensorSpec(shape=(lookback, num_inputs), dtype=tf.float32), # shape and dtype for x
tf.TensorSpec(shape=(1, 1), dtype=tf.float32) # shape and dtype for y
)
dataset = tf.data.Dataset.from_generator(
sample_generator,
args=(list(dynamic.keys())[0:34], lookback, num_inputs),
output_signature=output_signature
)
dataset
<FlatMapDataset shapes: ((365, 5), (1, 1)), types: (tf.float32, tf.float32)>
Now we iterate over the dataset and measure the time taken to get all the samples from 34 stations. We chose 34 because it is a manageable number for our example.
start = time.time()
for idx, (x,y) in enumerate(dataset):
pass
print(round(time.time() - start, 2), 'seconds taken')
print("index of last sample: ", idx)
print(x.shape, y.shape)
61.78 seconds taken
index of last sample: 397807
(365, 5) (1, 1)
getting batches instead of single samples (x,y pairs) during iteration
dataset = tf.data.Dataset.from_generator(
sample_generator,
args=(list(dynamic.keys())[0:34], lookback, num_inputs),
output_signature=output_signature
)
batch_size = 1024
dataset = dataset.take(1_000_000) # Limit to 1 million samples
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
dataset
<PrefetchDataset shapes: ((None, 365, 5), (None, 1, 1)), types: (tf.float32, tf.float32)>
Now when we iterate over dataset, we don’t get a single sample/example (x,y) pair at each iteration but we get a batch of samples and the length/size of the batch is determined by the batch_size parameter.
start = time.time()
for idx, (x,y) in enumerate(dataset):
pass
print(round(time.time() - start, 2), 'seconds taken')
print("index of last batch: ", idx)
print(x.shape, y.shape)
34.88 seconds taken
index of last batch: 388
(496, 365, 5) (496, 1, 1)
using tf.keras utility function which highly optimized
data = pd.concat([val for val in list(dynamic.values())[0:34]], axis=0)
print(data.shape)
dataset = tf.keras.utils.timeseries_dataset_from_array(
data.iloc[:, 0:-1].values,
targets=data.iloc[:, -1].values,
sequence_length=lookback,
batch_size=batch_size
)
dataset
(410218, 6)
<BatchDataset shapes: ((None, None, 5), (None,)), types: (tf.float64, tf.float64)>
start = time.time()
for idx, (x,y) in enumerate(dataset):
pass
print(round(time.time() - start, 2), 'seconds taken')
print("index of last batch: ", idx)
print(x.shape, y.shape)
22.67 seconds taken
index of last batch: 400
(254, 365, 5) (254,)
Total running time of the script: (2 minutes 17.893 seconds)