Solving simple classification using Neural Network
Neural networks are a computational approach, which is based on a large collection of neural units (AKA artificial neurons), loosely modeling the way a biological brain solves problems with large clusters of biological neurons connected by axons. One of the applications of neural network is to solve classification problems. Assume that you have a set of sample data which consists of two classes and you want to classify it. You can use neural network to do it. A neural network we use for this problem is Multilayer perceptron (MLP) which utilizes a supervised technique called backpropagation for training the network. If you need a tutorial on neural network and want to understand how it works, There are lots of tutorial out there for example : This one.
In this tutorial we are going to write a python code to implement our network and solve the problem. The data set we are using can be download from HERE.
Explanation of dataset
sample data is like table below:
|#||Feature 1(X)||Feature 2(Y)||Class(Label)|
As you can see, each record has two features called X and Y and a class label which is 0 or 1. The number of sample data in our data set is 100. Now we are going to plot our data using python matplotlib library as below:
#python 3.5 code for plotting dataset from matplotlib import pyplot as plt import numpy as np my_data = np.genfromtxt('data.txt'); class1 = my_data[my_data[:,2]==1] class2 = my_data[my_data[:,2]==0] plt.plot(class1[:,0],class1[:,1],'bx',class2[:,0],class2[:,1],'rx') plt.show()
And the result is:
There is no linear way (single line) to seperate these data (This is obvious, you can not draw a single line and seperate classes). But what is the solution? The solution is to draw multiple lines and use some logic to divide the space into two classes. Look at the image below: I draw three imaginary line and divide the space into four parts. now using AND logic we can say that All the samples on the right side of the line#3 AND samples between line#1 and line#2, belongs to class1(label 1 that shown as blue cross). Also all the samples between line#2 and line#3 and samples on the left side of line#1 belong to class2(red crosses).
THIS IS EXACTLY WHAT NEURAL NETWORKS DO. It has been proved that three layer neural network (input layer, hidden layer and output layer) can solve any non-linear problem. The hidden layer create these lines and the output layer decides which side of the line belongs to which class.
Now we are going to implement a multilayer perceptron using python to solve this problem. Here is the Python code and Matlab code download. ( I am using anaconda package for python 3.5 since it has all the necessary library including numpy, scipy and matplotlib).
These are the results with various parameters (click on each one to see the real size image):