1. What is the purpose of having a recurrent kernel (sort of an internal loop) in RNN?

RNNs are primarily applied on sequential datasets such as time series, natural language etc. The key property for such datasets is having a state at each time step. This means that we can’t shuffle the data for training since that would break the stateful property of the data. So, the model architecture should align it’s training and inference steps to appropriately learn the data.

  1. What is a stateful RNN? How is it different?

  2. What is a TimeDistributed layer in keras?

  3. Dimesions in a RNN layer Weights: $W_x$ for inputs and $W_y$ for outputs of the previous time step

$Input$ $(batch, num_steps, feature_dimensions$)
$W_{xh}$ $(feature_dimensions,hidden)$
$Hidden$ $(batch, hidden)$
$W_{hh}$ $(hidden,hidden)$
$W_{hq}$ $(hidden,output)$
$Output$ $(batch,output)$
  1. How can a RNN handle variable length sequences?

  2. How many parameters does an RNN need to learn?

  3. How can we improve training time?

  4. How to improve covergence of an RNN model?
  5. Bidirectional RNNs Simply put, a birectional RNN is a stack of two identical RNNs with each of them learning sequences in exactly opposite directions. The outputs from the two RNNs are then concatenated and resulted as main output. The advantage of doing so is that the model can learn context of the input sequence from both directions so it can generate accurate predictions. Note that a bidirection RNN will output double number of scores than it’s underlying RNN’s output size. 2.