This article is excerpted from PyTorch, an introductory book for deep learning. This book starts with an introduction to artificial intelligence, understands the basic theory of machine learning and deep learning, and learns how to build a model using the PyTorch framework.
For humans, things that have been seen before will leave memories in their minds, although the memories will slowly disappear, but when reminded, people can often regain their memories. In the study of neural networks, the research that made the model full of memory started very early. Saratha Sathasivam proposed the Hopefield network in 1982, but because of its difficulties, it did not have a good application scenario when it was proposed. Gradually forgotten. The rise of deep learning has led people to re-start research on the Recurrent Neural Network and has achieved great success in areas such as sequence problems and natural language processing.
This paper starts with the basic structure of the cyclic neural network, and introduces the application of RNN in natural language processing and its PyTorch implementation.
Recurrent neural networkThe previous chapter introduced the convolutional neural network, which is equivalent to human vision, but it does not have the ability to remember, so it can only deal with a specific visual task, and can't handle new tasks based on previous memories. . So is memory necessary for the network? Obviously it is necessary on some issues, for example, to infer a scene at the next point in a movie. This time it is not enough to rely on the current situation, it depends on the plot that happened before. For such problems that depend not only on the current situation but also on the past, the traditional neural network structure cannot be handled well, so the memory-based network model is indispensable.
The idea of ​​a cyclic neural network is based on the idea of ​​a memory model. It is expected that the network can remember the features that appear in the front, and infer the subsequent results based on the features, and the overall network structure continues to circulate because of the name of the cyclic neural network.
Basic structure of cyclic neural networkThe basic structure of a recurrent neural network is particularly simple, in which the output of the network is stored in a memory unit that enters the neural network along with the next input. Using a simple two-layer network as an example, on the basis of it is expanded to the structure of the cyclic neural network, which we simply use Figure 1.
It can be seen that the network will use the memory unit as input when inputting. The network not only outputs the result, but also saves the result to the memory unit. Figure 1 is a schematic diagram of the simplest cyclic neural network in one input.
The order of the input sequence changes, which changes the output of the network. This is because the existence of the memory unit causes the elements in the memory unit to change after the sequence changes, so the final output will be affected.
Figure 1 Passing a data point into the network
Figure 1 is a schematic diagram of a data point in the sequence coming into the network, then how is the entire sequence transmitted to the network? Each data point in the sequence is sequentially transmitted to the network, as shown in Figure 2.
Figure 2 passes the entire sequence to the network
No matter how long the sequence is, you can continuously enter the network and get the result. It may be seen here that the reader will have some questions. Is each network in Figure 2 an independent weight? For this problem, first consider the number of grids in Figure 2 if they are different sequences. For a network structure, it is unlikely that the number of such parameters will change.
In fact, the concept of parameter sharing is used again here, that is, although there are three grids above, they are all the same grid, and the output of the network depends on the input and memory units, which can be represented by Figure 5.5.
As shown in Figure 5.5, the left side is the actual network flow of the circulating neural network, and the right side is the result of expanding it. It can be seen that the network has a cyclic structure, which is also the origin of the name of the circulating neural network. At the same time, according to the structure of the cyclic neural network, it can be seen that it has a natural advantage in processing sequence type data, because the network itself is a sequence structure, which is the most essential structure of all cyclic neural networks.
Figure 3 Network input and memory unit
The cyclic neural network can also have a deep network layer structure, as shown in Figure 4.
Figure 4 Deep network structure
You can see that the network is unidirectional. This means that the network can only know the information on one side. Sometimes the information in the sequence is not only unilaterally useful. The bilateral information is also important for the prediction results, such as voice signals. See the circular neural network structure of the information on both sides. This does not require two cyclic neural networks to read the sequence input from the left and right sides respectively, and this task can be accomplished using a two-way cyclic neural network, as shown in Figure 5.
Figure 5 Bidirectional cyclic neural network
Shenzhen Ruidian Technology CO., Ltd , https://www.szwisonen.com