https://github.com/fchollet/keras/issues/4149
And refer to Traditional LSTM section of the wiki page, where all are vectors and matrices in the equatioins. I guess this is why an LSTM layer is considered as a "cell", a fully connected bunch of nodes.
https://en.wikipedia.org/wiki/Long_short-term_memory#History
Therefore, the input vector size can be different from the hidden layer size and the output vector size is equal to the hidden layer size. |