Solving MNIST dataset classification using Neural Network

In this article I am going to classify the handwritten digits from MNIST DATASET. This dataset contains 60,000 training image samples, each image is 28x28 and 10,000 test sample as well. There are four main file associated to this dataset you have to download in order to use it. More information about the dataset can be found here.

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.


First of all, we need to know how to read these files. I am going to read it and plot some of the images. This is a simple code to read all images into memory:


	import numpy as np
	import matplotlib.pyplot as plt
	from struct import unpack
	f = open('train_image.minst', 'rb')
	fl = open('train_label.minst', 'rb')
	magic_label = fl.read(4)
	magic = f.read(4)
	magic_label,  = unpack('>I', magic_label)
	magic,  = unpack('>I', magic)
	if (magic == 2051):
		print('Reading magic number...OK')
	else:
		print('Wrong magic number...exit')
		exit(0)
	#end if
	if (magic_label == 2049):
		print('Reading magic label number...OK')
	else:
		print('Wrong magic label number...exit')
		exit(0)
	#end if
	# number of label = number_of_image
	dump = fl.read(4)
	number_of_image,  = unpack('>I', f.read(4))
	print ("Number of training image : {}".format(number_of_image))
	number_of_rows,  = unpack('>I', f.read(4))
	number_of_cols,  = unpack('>I', f.read(4))
	print ("Images are {} by {} pixels".format(number_of_rows, number_of_cols))
	print('Reading All images into memory', end='')
	sys.stdout.flush()
	image_set = np.zeros((number_of_image, number_of_rows, number_of_cols), dtype=np.float64)
	image_label = np.zeros((number_of_image))
	read_count = 0
	for i in range(0,number_of_image):
		for j in range(0, number_of_rows):
			for k in range(0, number_of_cols):
				val,  = unpack('B', f.read(1))
				image_set[i, j, k]  = val/255.0
			#end for k
		#end for j
		data = np.ravel(image_set[i])
		data = ' '.join(map(str, data))
		image_label[i], =unpack('B', fl.read(1))
		q="{0:b}".format(int(image_label[i])).zfill(4)
		q=' '.join(list(q))
		str_out = data + ' ' + q
		#fminst.write(str_out + "\n")
		read_count +=1
		if(read_count % 100==0):
			print('.', end='')
			sys.stdout.flush()
	#end for i
	print("DONE!!!")
			
Now we have all the images in image_set variable and all their corresponding labels in image_label variable and to plot some samples we use this code:

	#for example to show the fifth image in dataset
	# plt is the alias for matplotlib.pyplot
	plt.imshow(image_set[5, :, :], cmap='gray')
	plt.show()
		
These are some samples I plotted:

As you can see some of the samples are even hard to read for human (What is the digit in second row, third column below?).

I am not going to preprocess images since the main purpose of this dataset is to learn classification and learning techniques with miniman efforts on preprocessing and formatting. you can download python code I've written using neural networks and try it yourself. I used one hidden layer with 200 units in hidden layer, learning rate equal to 0.8 and 50,000 sample for training and 10,000 sample for validation. The activation function for each cell is sigmoid function and the maximum acceptable error for training part is 500 out of 50,000. The result is as follows:

	Reading magic number...OK
	Reading magic label number...OK
	Number of training image : 60000
	Images are 28 by 28 pixels
	Reading All images into 
	memory........................................................................................................................................................
	....................................................................................................................................................................
	....................................................................................................................................................................
	........................................................................................................................DONE!!!
	starting train routine....
	..................................................Running 1 Epoch with 864 error(s)
	..................................................Running 2 Epoch with 703 error(s)
	..................................................Running 3 Epoch with 699 error(s)
	..................................................Running 4 Epoch with 674 error(s)
	..................................................Running 5 Epoch with 710 error(s)
	..................................................Running 6 Epoch with 628 error(s)
	..................................................Running 7 Epoch with 609 error(s)
	..................................................Running 8 Epoch with 570 error(s)
	..................................................Running 9 Epoch with 544 error(s)
	..................................................Running 10 Epoch with 556 error(s)
	..................................................Running 11 Epoch with 582 error(s)
	..................................................Running 12 Epoch with 516 error(s)
	..................................................Running 13 Epoch with 516 error(s)
	..................................................Running 14 Epoch with 549 error(s)
	..................................................Running 15 Epoch with 459 error(s)
	After running 15 Epoch, finished with 459 error(s)
	Now starting to test the weights on 10000 test data...
	Finish test with 775 error out of 10000 test sample.
			

As you can see, final test shows that we have 775 error in 10,000 test data which is about 7.75% error (means 92.25% accuracy). I have tested my code several times and this is almost the best result we can get from a simple multilayer perceptron with one hidden layer and no preprocessing. Next time I am going to use k-nearest neighbor algorithm.