Image classification using deep learning
ABSTRACT
Nowadays image recognition technology is widely used, and plays a very important role in various fields. Deep learning technology uses multilayer structure to analyze and deal with image features, which can improve the performance of image recognition. The popular models of deep learning contains Attention Models, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and other improved methods. The applications of image recognition based on deep learning technology including image classification, facial recognition, image search, object detection, pedestrian detection, video analysis. In this project we are dealing with the image classification using deep learning. We try to use Alexnet, Resnet and Many different architecture with convolutional neural networks for this purpose. The image classification is a classical problem of image processing, computer vision and machine learning fields. In this project we study the image classification using deep learning. We use CNN architecture which means convolutional neural networks for this purpose. We use 2000 dogs and cats images for training the model and 1000 cats and dogs images are for validation purpose. These data are selected from the Google Cloud database for the classification purpose. We cropped the images for various portion areas and conducted experiments. The results show the effectiveness of deep learning based image classification using CNN.
INTRODUCTION
Image recognition is the ability of a system or software to identify objects, people, places, and actions in images. Here we’re going to follow some steps to get computer to recognize the given image whether cat or dog by the user. To get computer to predict we use Deep Learning Technology to get prediction from the computer by the help of Artificial Intelligence.
For reading image data we use Convolutional Neural Network ( CNN ) and Pass the information to the Artificial Neural Network ( ANN ) to get train the model by giving set of information about the image. A convolution neural network is a kind of deep neural network with a convolution structure. Except for the traditional image processing methods, the convolution neural network has achieved good image processing tasks and has strong generalization ability. This network is characterized by local perception, weight sharing, and pooling, which can effectively reduce the number of network parameters and quickly capture the deep features of the input image.
This architecture can train the existing dog and cat image samples, learn the features of data synthesis, build a network structure with stronger expression ability, and automatically mine the feature engineering of data, which can quickly identify the given images. Solve the problem of image recognition by the system.
You can download the dataset by clicking below link
https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip
WHAT IS CONVOLUTIONAL NEURAL NETWORK ?
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics
Here Image is 5 x 5 Matrix which has all information about the image. Then we’re applying Filter for extracting feature about the image. The filter is randomly generated matrix. The feature is also called output.
Lets say the image size is 5 x 5 then the filter size is 3 x 3. Now, the output size will be
Output Size = ( N – F ) + 1
* N is Image size
* F is Filter size
Now, lets do computation for extracting feature. Lets say a is image matrix, b is filter matrix and c is Feature matrix.
For feature c11 = ( a11 * b11 ) + ( a12 * b12 ) + ( a13 * b13 ) + ( a21 * b21 ) + ( a22 * b22 ) + ( a23 * b23 ) + ( a31 * b31 ) + ( a32 * b32 ) + ( a33 * b33 )
POOLING
In general terms pooling refers to a small portion, so here we take a small portion of the input and try to take the average value referred to as average pooling or take a maximum value termed as max pooling, so by doing pooling on an image we are not taking out all the values we are taking a summarized value over all the values present !!! Below Image we’re using max-pooling for getting maximum value from the Output (or) Feature Matrix. For getting max-pooled value we need to use again filter. Here in this case we’re using 2 x 2 Filter with Stride of 2. Lets talk about stride, Here the output matrix or feature is of dimensions 4 × 4 with a filter of 2×2 and a stride of 2 so every time we skip two columns and convolute.
After going through the above process, we have successfully enabled the model to understand the features. Moving on, we are going to flatten the final output and feed it to a regular Neural Network for classification purposes.
CLASSIFICATION – FULLY CONNECTED LAYER ( FC LAYER )
Now that we have converted our input image into a suitable form for our Multi-Level Perceptron, we shall flatten the image into a column vector. The flattened output is fed to a feedforward neural network and backpropagation applied to every iteration of training. Over a series of epochs, the model is able to distinguish between dominating and certain low-level features in images and classify them using the Softmax Classification technique.
Here I showed the example architecture of how Image Data is passed to Neural Networks
OUR SYSTEM ARCHITECTURE:
PROJECT REPOSITORY LINK :
CONCLUSION
Image recognition can really help you with digital marketing. By integrating the application’s programing interface to your text-based analytics platforms, you will be able to offer visual insights to your customers without the expensive product creation that uses logo detection. Image recognition can also help you monitor ROI and protect your brand. You will be able to track how a sponsorship is doing with image and logo detection and this will help you determine how much revenue you will get in return. Therefore, integrating an image recognition application programing interface is an easy way of giving your customers the best service.





Comments
Post a Comment