Multi-attribute Recognition,the Key to Universal Neural Network
Jinxin Wei1, Qunying Ren2
1Vocational School of Juancheng, Juancheng 274600 China
2 Bureau of Emergency Management of Juancheng County, Juancheng 274600 China
Abstract: To achieve the recognition of multi-attribute of object, I redesign the mnist dataset, change the color, size, location of the number. Meanwhile, I change the label accordingly. The deep neural network I use is the most common convolution neural network. Through test, we can conclude that we can use one neural network to recognize multi-attribute so long as the attribute difference of objects can be represented by functions. The Concrete network(generation network) can generate the output which the input rarely contained from the attributes the network learned. Its generalization ability is good because the network is a continuous function. Through one more test, We can conclude that one neural network can do image recognition, speech recognition, nature language processing and other things so long as the output node and the input node and more parameters add into the network. The network is universal so long as the network can process different inputs. By proof, fully connected network can do what convolution neural network and recurrent neural network do, so fully connected network is the universal network.The phenomenon of synesthesia is the result of multi-input and multi-output. Connection in mind can realize through the universal network and sending the output into input. Connection in mind is the key of creativity, synesthesia is the assistant.
keywords: Computer vision, multi-attribute, deep neural network, multi-dimension, data processing, universal neural network, parallel processing, speech recognition, nature language processing, synesthesia, connection in mind
Redesign of Mnist
There are many multi-label learning examples which contain many labels and networks. I design a single label and a single network to solve the multi-attribute problem.
Because we don’t have the dataset fit for my task, so I redesign the mnist dataset. Because the visual attributes of object recognized by mankind are color, size, location, shape, texture, quantity, pattern, so we choose the color, size, location, and shape attributes as example. Because the mnist dataset already has the shape attribute, so we only need to add color, size, location. First, we change the color. Because the color represented by computer is mixed by red, green and blue, so we change the number’s color to red, green and blue. We assign the gray’s pixel data to red channel, green channel, blue channel separately, the other two channels are zero. The background is all 255. Secondly, we change the size. Shrinking the image to size 1818, then put the pixel to the white background of size 2828. When we put the pixel to up part of the background, we change the location. Next we change the label. We use the label form similar to the label form used in classification. Class 0-9 is one hot encoding. For example, 0100000000 is 1. Because there are 3 colors, so the red color is 100, green is 010,blue is001. The index of color label is 10-12. Because there are two sizes which are big and small, so the code is 01, 10 and the index is 13-14. Because there are two locations which are up and middle, so the code is 10, 01 and the index is 15-16.So far, we finish the dataset’s processing work. The order of label is number, color, size, location. For example, 01000000001000101 represents big middle red 1. Why we use one hot encoding? Because each class has one output, we can generate the multi-output by regression, and the output is from 0 to 1 which is similar to data normalization.
Test Design
Now we design the network. The experiment is done by tensorflow framework. The regression network have 3 convolution layers and 2 fully connected layers, no maxpooling and padding, activation function is leaky relu[2]. Because they will cause generation loss. The generation network is the inverse function [4]of regression network. You can read my another paper named ‘A Functionally Separate Autoencoder’, which describes the detail of generation from label to concrete information. When training the regression network, loss function is mean squared error, optimizer is adam[3], metrics is accuracy[3]. I take np.argmax()[1] of prediction[:,0:10], [:,10:13], [:,13:15], [:,15:17] separately, and the real label is processed the same way. The regression value become index type through this, and the accuracy can be calculated by comparing the data. When training the generation network, loss function is mean squared error, optimizer is adam, metrics are mean absolute error and cosine similarity, input is multi-attribute label, output is image. So, let’s see the result which is shown in Fig. 1.1 and Fig. 1.2.