- [Jabril Voiceover] Oh, this is it, perfect. I think these extra layers are gonna make it so much better. Oh yeah, increasing the size of this layer was a really good idea. All right, okay, I can't wait any longer. It's time to test it. (John Green Bot whirs) (John Green Bot beeps) - [John Green Bot] Jabril, Jabril, I wrote a novel. - Whoa, John Green Bot, you did what? - [John Green Bot] I wrote a novel. - A novel? Let me see this. Wow, John Green Bot, this is pretty sloppy. We need to work on your handwriting. (reflective orchestral music) (John Green Bot whirs) (John Green Bot whirs) Hold up, hold up. You wrote one letter per page? This is impossible to read. John Green Bot, we've got to get your novel an audience, so let's digitize this using machine learning. But first, there's something else we have to test. (upbeat rock music) (person whistles) Welcome back to Crash Course AI. I'm your host Jabril, and today, we'll be doing something a little different. This is the first time we're trying a hands-on lab on Crash Course, so we'll tackle our project together and program a neural network to recognize handwritten letters. All right, John Green Bot, we'll get back to you when we got somethin'. (John Green Bot whirs) (John Green Bot beeps) We'll be writing all of our code using a language called Python and a tool called Google Colaboratory. You can see the code we're about to go over in your browser from the link we put in the description, and you can follow along with me in this video. In these Colaboratory files, there's some regular text explaining what we're trying to do and pieces of code that we can run by pushing the play button. These pieces of code build on each other, so keep in mind that we have to run them in order from top to bottom, otherwise we might get an error. To actually run the code and experiment with changing it, you have to either click Open in playground at the top of the page or open the File menu and click Save a copy to Drive. And just FYI, you'll need a Google account for this. Remember, our goal is to program a neural network to recognize handwritten letters and convert them to typed text. Even though this stack of papers is unreadable to me, we can work with it, and it could actually make our project a little easier. Usually with a project like this, we'd have to write code to figure out where one letter ends and another begins because handwriting can be messy and uneven. That's called the segmentation problem. But because John Green Bot wrote his novel like this, the letters are already segmented, and we can just focus on recognizing the letter on each page. By the way, avoiding the segmentation problem is also why official forms sometimes have little boxes for each letter, instead of just a line for writing your name. Even though we don't have to worry about segmentation, recognizing handwritten letters and converting them to typed text is still tricky. Every handwritten J looks a little different, so we need to program our neural network to recognize a pattern instead of memorizing a specific shape. But before we do this, let's think about what we need to get there. Neural networks need a lot of labeled data to learn what each letter generally looks like. So step one is, find or create a labeled dataset to train our neural network. And this involves splitting our dataset into the training set and the testing set. The training set is used to train the neural network, and the testing set is data that's kept hidden from the neural network during training so it can be used to check the network's accuracy. Next is step two, create a neural network. We'll actually need to configure an AI with an input layer, some number of hidden layers, and the ability to output a number corresponding to its letter prediction. In step three, we'll train, test, and tweak our code until we feel that it's accurate enough. And finally in step four, we'll scan John Green Bot's handwritten pages and use our newly trained neural network to convert them into typed text. All right, let's get started. Step one, creating a labeled dataset can be a huge and expensive challenge, especially if I have to handwrite and label thousands of images of letters by myself. Luckily, there's already a dataset that we can use, the Extended Modified National Institute of Standards and Technology dataset, or EMNIST for short. This dataset has 10s of thousands of labeled images of handwritten letters and numbers generated from US Census forms. Some of the handwriting is relatively neat and some, not so much. We're gonna use the EMNIST letters chunk of the dataset, which has 145,600 images of letters, because we're only recognizing letters in John Green Bot's book, not numbers. This code here will give our program access to this dataset, also called importing it. So now, we need to make sure to keep our training and testing datasets separate so that when we test for accuracy, our AI has never seen the testing images before. So now in our code at step 1.2, let's call the first 60,000 labeled images, train, and the next 10,000 labeled images, test. These images of letters are 28 by 28 pixels, and each pixel is a grayscale value between zero and 255. To normalize each pixel value and make them easier for the neural network to process, we'll divide each value by 255. That will give us a number between zero and one for each pixel in each image. Performing a transformation like this to make the data easier to process is a machine-learning method called preprocessing. By the way, we'll need different preprocessing steps for different types of data. All right, it may take a few seconds to download and process all the images, so while that's happening, I want to clarify that EMNIST is a luxury. There aren't many already-existing datasets where you have this much labeled data to use. In general, if we try and solve other problems, we have to think hard about how to collect and label data for training and testing our networks. Data collection is a very important step to training a good neural network. In this case, though, we've got plenty to use in both sets. Okay, let's write a little piece of code to make sure that we imported our dataset correctly. This line lets us display an image, and we'll also display the label using the print command. See, this letter is labeled as a Y. We can display a different example by changing this index number, which tells our program which letter image in the EMNIST dataset to pull. Let's look at the image indexed at 1,200. This is labeled as a W. These are already-labeled images. There's no neural network making any decisions yet, but this is a labeled dataset, so we're done with the first step. Step two, now that we have our dataset, we need to actually build a neural network, but we don't need to reinvent the wheel here. We're going to stick to a multi-layer perceptron neural network, or MLP for short, which is the kind we've focused on in the Neural Networks and Deep Learning episodes. There are already some tools in Python called libraries that we can use to make the network. We're going to use a library called sklearn, which is short for scikit-learn. We'll import that so we have access to it. Sklearn includes a bunch of different machine-learning algorithms, and we'll be using its multi-layer perceptron algorithm in this lab. So, our neural network is gonna have images of handwritten letters as inputs. Each image from the EMNIST is 28 by 28 pixels, and each of these pixels will be represented by a single input neuron, so we'll have 784 input neurons in total. Depending on how dark a particular pixel is, it will have a grayscale value between zero and one, thanks to the processing we did earlier. The size of our output layer depends on the number of label types that we want our neural network to guess. Since we're trying to guess letters and there are 26 letters in the English alphabet, we'll have 26 output neurons. We don't actually have to tell the network this, though. It will figure this out on its own from the labels in the training set. For the structure of the hidden layers, we'll just start experimenting to see what works. We can always change it later, so we'll try a single hidden layer containing 50 neurons. Over the span of one epoch of training this neural network, each of the 60,000 images in the training dataset will be processed by the input neurons. The hidden layer neurons will randomly pick some aspect of each image to focus on. And the output neurons will hold the best guess as to whether each image is a particular letter. You'll see that the code in our Colab notebook calls this an iteration. In the specific algorithm we're using, an iteration and an epoch are the same thing. After each of the 60,000 images are processed, the network will compare its guess to the actual label and update weights and biases to give a better guess for the next image. And after multiple epochs of the same training dataset, the neural network's prediction should keep getting better thanks to those updated weights and biases. We'll just go with 20 epochs for now. We've captured all that in a single line of code in step 2.1, which creates a neural network with a single hidden layer with 50 neurons that will be trained over 20 epochs. This is why libraries can be so useful. We're accessing decades of research with just one line of code. But keep in mind, there are cons to using libraries like this as well. We don't have a lot of control over what's happening under the hood here. When solving most problems, we'll want to do a mix of using existing libraries and writing our own AI algorithms, so we would need a lot more than just one line of code. For this lab, though, step two is done. Step three, next, we want to actually train our network over those 20 epochs and see how well it guesses the letters in the training and testing datasets, with this one line of code in step 3.1. For every epoch, our program prints a number called the error of the loss function. This basically represents how wrong the network was overall. We want to see this number going down with each epoch. The number that we really care about is how well the network does on the testing dataset, which shows how good our network is at dealing with data it's never seen before. And we have 84% correct. Now, that's not bad considering we only trained for 20 epochs, but we still want to improve it. To see where the network made most of its mistakes, we can create a confusion matrix, which we made in step 3.2. The color of each cell in the confusion matrix represents the number of elements in that cell, and a brighter color means more elements. The rows are the correct values, and the columns are the predicted values, and the numbers on the axes represent the 26 letters in the alphabet. So, zero is A and one is B, et cetera, et cetera. So, cell zero, zero represents the number of times that our network correctly predicted that an A is an A. It's good to see a bright diagonal line because those are all the correct values. But other bright cells are mislabeled, so we should check if there are any patterns. For example, I and L may be easy to confuse, so let's look at some cases where that happened. We can also try other types of errors, like every time our network guesses that a U is a V. 37 times. To see if we can improve our accuracy, we can program a slightly different neural network. More epochs, more hidden layers, and more neurons in the hidden layers could all help, but the trade-off is that things will be a bit slower. We can play around with the structure here to see what happens. For now, let's try creating a neural network that has five hidden layers of a hundred neurons each, and we'll train it over 50 epochs. It'll take a few minutes to run. (reflective orchestral music) (tape squeaks) Now we've got better accuracy rates on our testing dataset. We got 88% correct instead of 84%, and that's an improvement. Over time, we can develop an intuition about how to structure neural networks to achieve better results. See if you can create a network that has a higher accuracy than ours on the testing dataset. But for now, we're gonna move forward with this trained network. Step four, this final step is our moment of truth. We're gonna use our trained neural network to try and read John Green Bot's novel, so let's dig into this stack of papers. First, we gotta get our data in the right format by scanning all these papers. (sighs) (reflective orchestral music) (tape squeaks) And done. And because we're using Google Colab, we need to get them online. We're storing them in a GitHub repository which we coded to import into our Colaboratory notebook. But as you can see, those scanned images are huge, so we've also done a bit of preprocessing on them to avoid having to download and compute over so much data. We've changed the size of every image to 128 by 128 pixels. The other thing you may notice is that the EMNIST dataset uses a dark background with light strokes, but our original scans have a white background with dark strokes. So we also went ahead and inverted the colors to be consistent with EMNIST. All right, so now, back to the Colab notebook. So this code right here in step 4.1 will pull the modified letters from GitHub. Now, we'll read them into an array and display one of them just to make sure we're able to import them correctly. This looks pretty good, clearer than the EMNIST data, actually. But back to the point why we're doing this in the first place, let's see if we can process John Green Bot's story now. (computer beeps) Uh, this is not making any sense, so we're doing something wrong. First off, John Green Bot's story had some empty spaces between words. We never actually trained our model on empty spaces, just the 26 letters, so it wouldn't be able to detect these. But blank pages should be easy to detect. After all, unlike handwritten letters, all blank images should be exactly the same. So we'll just check each image to see if it's a blank space, and if it is, we'll add a space to our story. This looks better. There are separate words, and I can tell that the first word is the, but not much beyond that. Something else isn't going right here. Well, even though the letters on the pages that were scanned look clear to my human eyes, the images were really big compared to the handwritten samples that were used to train EMNIST. We resized them, but that doesn't seem to be enough. To help our neural network digitize these letters, we should try processing these images in the same way that EMNIST did. Let's do a little detective work to figure out how the EMNIST dataset was processed so our images are more similar to the training dataset and our program's accuracy will hopefully get better. Hmm, "Further information on the dataset contents "and the conversion process can be found in the paper." We're not gonna go through the paper, but we'll link it in the description if you want to learn more. Basically, I made the following additions to the code. We're applying some filters to the image to soften the letter edges, centering each letter in the square image, and resizing each one to be 28 by 28 pixels. As part of this code, we're also displaying one letter from these extra-processed images to do another check. Even though to my eyes, the letters look less clear now, they do look much more similar to the letters in the EMNIST dataset, which is good for our neural network. The edges of the letters are kind of fuzzy and they're centered in the square, so let's try processing this story one more time. Keep in mind, though, that with an 88% accurate model, we expect to get about one in 15 letters wrong in the story. John Green Bot, are you ready? (John Green Bot whirs) (John Green Bot beeps) All right, let's see what we were talking about. (suspenseful drum music) "The Fault in Our Power Supplies. "I fell in love the way your battery dies, "slowly and then all at once." Quite poetic, John Green Bot. (John Green Bot beeps) Okay, it's not perfect, but it was pretty easy to figure out with context and by knowing which letters might be mistaken for each other. Regardless, thanks, John Green Bot, for giving us a little taste of your first novel. And thank you for following along in our first Crash Course lab. Let us know in the comments how you think you could improve the code and tell us if you use it in any of your own projects. Now, this kind of supervised machine learning is a big component of the AI revolution, but it's not the only one. In later videos, we'll be looking at other types of machine learning, including unsupervised and reinforcement learning, to see what we can do even without a giant labeled dataset. See ya then.