If hidden layer size is too small compared to input size, then the learning will saturate early. The loss will be saturated at relatively high value.
The hidden/input ratio for the ideal hidden size grows as the input size becomes large.
To make AEs with a big input size, train layer by layer: from the largest one to the central bottleneck layer. Ex. to train n1 - n2 - n3 - n4 - n5 NN, first train n1 - n2 - n5, and, train n2 - n3 - n4, and then, train n1 - n2 - n3 - n4 - n5 as a whole.
AE_satcost.png
|