Neural networks can learn complicated representations fairly easily. However, there are some tasks where new data (or categories of data) is constantly changing. For example, you may train a network to recognize pictures of 8 different types of cats. But in the future, you may want to change that to 12 breeds. If the network has to keep learning new data over time, it is called a continual learning problem.
What is Continual Learning
Continual Learning just means being able to learn continuously over time. The data arrives in sequences over time, and the algorithm has to learn to be able predict that new data. Usually, techniques like transfer learning are used, where the model is trained on previous data, and some features are used from that model to learn new data. This is usually done to reduce the time required to train models from scratch. It is also used when the new data is sparse.
Introducing Dynamically Expandable Networks
At a very high level, the idea of Expanding Networks is very logical. Train a model, and if it cannot predict very well, increase its capacity to learn. If a new task arrives that is vastly different from an existing task, extract whatever useful information you can from the old model and train a new model.
Let's give an example of three methods to make such a structure possible.
1) Selective retraining — Find the neurons that are relevant to the new task and retain them.
2) Dynamic Network Expansion — If the model is unable to learn from step 1 (i.e. the loss is above a threshold value), increase the capacity of the model by adding more neurons.
3) Network Split/Duplication — If some new models’ units have begun to change drastically, duplicate those weights, and retrain those duplicates, while keeping the old weights fixed.
The simplest way to train a new model would be to train the entire model every time a new task arrives. However, because deep neural networks can get very large, this method will become very expensive.
Dynamic Network Expansion
Selective retraining works for tasks that are highly relevant from older tasks. But when newer tasks have fairly different distributions, it will begin to fail. The authors use another technique to ensure that newer data can be represented by increasing the capacity of the network. They do so by adding additional neurons.
There is a common problem in transfer learning called semantic drift, or catastrophic forgetting, where the model slowly shifts its weights so much that it forgets about the original tasks better to duplicate the neurons if they shift beyond a certain range. If the value of a neuron changes beyond a certain value, a copy of the neuron is made, and a split occurs, and that duplicate unit is added as a copy to that same layer.