How to Make an Image Classifier – Intro to Deep Learning #6


How do we classify things? We consider people to be experts
in a field if they’ve mastered classification. Doctors can classify between a good
blood sample, and a bad one. Photographers can classify if their
latest shot was beautiful, or not. Musicians can classify what sounds good,
and what doesn’t, in a piece of music. The ability to classify well
takes many hours of training. We get it wrong over, and over again,
until eventually we get it right. But with a quality data set,
deep learning can classify just as well, if not better than we can. We’ll use it as a tool to improve
our craft, whatever it is. And if the job is monotonous,
it’ll do it for us. When we reach the point where we
aren’t forced to do something we don’t want to just to survive,
we’ll flourish like never before. And that’s the world we’re aiming for.>>Hello, world, it’s Siraj. And today, we’re going to build an image classifier
from scratch, to classify cats and dogs. Finally, we get to work with images. I’m feeling hype enough
to do the Macarena. [MUSIC] So, how does image classification work? Well, there were a bunch of different
attempts in the 80s, and early 90s, and all of them tried a similar approach. Think about the features
that make up an image, and hand code detectors for each of them. But there is so much variety out there. No two apples look exactly the same. So the results were always terrible. This was considered a task
only we humans could do. But in 98, a researcher named introduced a model
called a Convolutional Neural Network. Capable of classifying characters with a
99% accuracy, which broke every record. But CNN learned features by itself. In 2012, it was used by other
researcher named Alex Krizhevsky at the yearly ImageNet competition. Which is basically the annual
Olympics of computer vision. And it was able to classify thousands
of images with a new record accuracy, at the time of 85%. Since then CNN’s have been adopted by
Google, to identify photos in search, Facebook for automatic tagging. Basically they are very hot right now. But where did the idea for
CNN’s come from? [MUSIC] We’ll first want to download our image
data set from Cackle with 1024 pictures of dogs and cats,
each in its own folder. We’ll be using the Keras deep
learning library for this demo. Which is a high level wrapper
that runs on top of TensorFlow. It makes building models
really intuitive, since we can define each layer
as it’s own line of code. First thing’s first, we’ll initialize
variables for our training and validation data. Then we’re ready to build our model. We’ll initialize the type of model
using the sequential function, which will allow us to build
a linear stack of layers, so we treat each layer as an object
that feeds data to the next one. It’s like a conga line, kind of. No, the alternative would be a graph
model, which would allow for multiple separate inputs and outputs. But we’re using a more simple example. Next, we’ll add our first layer,
the convolutional layer. The first layer of a CNN is
always the convolutional layer. The input is going to be a 32 by
32 by 3 array of pixel values. The 3 refers to RGB values. Each of the numbers in this array
is given a value from 0 to 255, which describes the pixel
intensity at that point. The idea is that,
given this as an input, our CNN will describe the probability
of it being of a certain class. We can imagine the Convolutional Layer
as a flashlight shining over the top left of the image. The flashlight slides across all
the areas of the input image. The flashlight is our filter, and the region it shines over
is the Receptive field. Our filter is also an array of numbers. These numbers are weights
at a particular layer. We can think of a filter
as a feature identifier. As our filter slides, or
convolves around the input, it is multiplying its values with
the pixel values in the image. These are called element
wise multiplications. The multiplications from each
region are then summed up, and after we’ve covered all parts of the
image, we’re left with the feature map. This will help us find not buried
treasure, but a prediction. Which is even better. Since our weights
are randomly initialized, our filter won’t start off being
able to detect any specific feature. But during training, our CNN will
learn values for its filters. So this first one will learn to detect
a low level feature, like curves. So if we place this filter on a part of
the image with a curve, the resulting value from the multiplication,
and summation, is a big number. But if we place it on a different
part of the image, without a curve, the resulting value is zero. This is how filters detect features. We’ll next pass this feature map through
an activation layer, called ReLU, or rectified linear unit. ReLu is probably the name of same alien,
but it’s also a non-linear operation, that replaces all the negative pixel
values in the feature map with zero. We could use other functions, but ReLu tends to perform
better in most situations. This layer increases the non-linear
properties of our model, which means our neural net will be able
to learn more complex functions than just linear regression. After that,
we’ll initialize our max pooling layer. Pooling reduces the dimensionality
of each feature map, but retains the most
important information. This reduces the computational
complexity of our network. There are different types, but
in our case, we’ll use Max. Which takes its largest element from the
rectified feature map within a window we define, and will slide this window
over each region of our feature map, taking the max values. So a classic CNN architecture looks
like this, three Convolutional Blocks, followed by a Fully Connected layer. We’ve initialized
the first three layers. We can basically just repeat
this process twice more. The output feature map is fed into
the next convolutional layer. And the filter in this layer will learn
to detect more abstract features, like paws and doge. One technique we’ll use to prevent over
fitting, that point when our model isn’t able to predict labels for
novel data, is called dropout. A dropout layer drops out a random
set of activation’s in that layer, by setting them to zero
as data flows through it. To prepare our data for the dropout, we’ll first flatten
the feature map into one dimension. Then we’ll want to initialize a fully
connected layer with the dense function, and apply ReLu to it. After dropout, we’ll initialize
one more fully connected layer. This will output an n
dimensional vector, where n is the number
of classes we have. So it would be two. And by applying a sigmoid to it, it will
convert the data to probabilities for each class. So how does our network learn? Well, we’ll want to minimize a loss
function which measures the difference between the target output,
and the expected output. To do this,
we’ll take the derivative of the loss, with respect to
the weights in each layer. Starting from the last, compute the
direction we want our network to update. We’ll propagate our loss backwards for
each layer. Then we’ll update our weight values for
each filter, so they can change in the direction of the
gradient that will minimize our loss. We then figure the learning process
by using the compile method. Where we’ll define our loss as binary
crossentropy,which is the preferred loss function for
binary classification problems. Then our optimizer, rmsprop,
which will perform gradient descent. And a list of metrics which
will set to accuracy, since this is a classification problem. Lastly, we’ll write out our fit
function to train the model, giving it parameters for
the training and validation data. As well as a number of epochs to run for
each. And let’s save our weights, so
we can use our trained model later. Overall accuracy comes to be about 70%,
similar to my attention span. And if we feed our model a new
picture of a dog or cat, it will predict its label
relatively accurately. We could definitely improve
our prediction though, by either using more pictures, or
by augmenting an existing pre-trained network with our own network,
which is considered transfer learning. So to break it down, convolutional
neural networks are inspired by the human visual cortex, and offer state
of the art and image classification. CNN’s learned filters at each
convolutional layer that act as increasingly abstract feature detectors. And with Keras and TensorFlow,
you can build your own pretty easily. The winner of the coding challenge from
the last video, is Charles David-Blot. He used Tensorflow to build a deep net,
capable of predicting whether or not someone would get a match or
not after training on a data set. And had a pretty sweet data
visualization of his results. Wizard of the Week. And the runner up is Dalai Mingat,
clean, organized, and documented code. The coding challenge for this video
is to create an image classifier. For two types of animals,
instructions are in the read me. Post your GitHub link in the comments,
and I’ll announce the winner next Friday. Please subscribe if you want to
see more videos like this, check out this related video, and
for now, I’m gotta upload my mind. So, thanks for watching

0 thoughts on “How to Make an Image Classifier – Intro to Deep Learning #6

Leave a Reply

Your email address will not be published. Required fields are marked *