Loading …

Machine learning

    [Project] pyTsetlinMachine released. High-level Tsetlin Machine Python API with fast C-extensions.

    [Project] pyTsetlinMachine released. High-level Tsetlin Machine Python API with fast C-extensions.

    I have made a Python library for the Tsetlin Machine. You can now set up, train and evaluate Tsetlin Machines in just three lines of code. I have used C extensions for speed, wrapped in Python. Currently, the Multi-class and Convolutional Tsetlin Machines are available. The Regression Tsetlin Machine follows soon. Will also add more demos and support functions (e.g. binarization).
    https://github.com/cair/pyTsetlinMachine

    https://i.redd.it/s04lp2nyhc431.png

    submitted by /u/olegranmo
    [link] [comments]

    Source link

    Click here to read more

    [R] Learning to Route in Similarity Graphs

    [R] Learning to Route in Similarity Graphs

    Learning to Route in Similarity Graphs (arxiv)

    The paper improves Similarity Graphs for large-scale Nearest Neighbor Search by training an agent to efficiently navigate the graph with deep imitation learning. Put simply, these guys train the search engine to better navigate the graph of all images so as to find the nearest neighbours. Basically Deep Imitation Learning meets Graph Convolutional Networks meets Web/Image Search and other fancy large-scale applications.

    Toy example. Each node represents one data point (e.g. image). Given the query "q", the algorithm navigates the graph from "start" vertex to find the nearest neighbour "gt" for the query. The yellow path follows the oririginal search procedure, the orange path corresponds to the learned agent.

    Read the paper (arxiv) , browse the code (github) or talk to authors at ICML right about now if you're attending πŸ™‚

    (source: saw the paper at icml, acquainted with the authors)

    submitted by /u/justheuristic
    [link] [comments]

    Source link

    Click here to read more

    [D] Gavin Miller: Adobe Research | Artificial Intelligence Podcast

    [D] Gavin Miller: Adobe Research | Artificial Intelligence Podcast

    Gavin Miller is the Head of Adobe Research. Adobe have empowered artists, designers, and creative minds from all professions working in the digital medium for over 30 years with software such as Photoshop, Illustrator, Premiere, After Effects, InDesign, Audition that work with images, video, and audio. Adobe Research is working to define the future evolution of these products in a way that makes the life of creatives easier, automates the tedious tasks, and gives more & more time to operate in the idea space instead of pixel space. This is where the cutting-edge deep learning methods of the past decade can shine more than perhaps any other application. Gavin is the embodiment of combing tech and creativity. Outside of Adobe Research, he writes poetry & builds robots.

    Video: https://www.youtube.com/watch?v=q0mokx-iiws

    https://i.redd.it/9adlhbz7yk331.png

    Outline:

    0:00 - Introduction

    1:11 - Poetry & crossover to creative work

    6:35 - Turning one medium into another

    7:45 - Creative process in both the space pixels and ideas

    10:00 - Improving workflow in Adobe tools with AI

    14:31 - Taking ideas from prototype to product

    16:22 - Learning how to use Adobe tools

    21:13 - Applications of deep learning

    28:46 - Improving user experience from data

    34:30 - Augmented reality and virtual reality

    39:57 - Resistance to change

    43:40 - Poem - Today I Left My Phone at Home

    44:17 - Illusion of beauty in digital space

    49:17 - Secret to a thriving research lab

    55:27 - Future ideas in Adobe Research

    58:13 - Robotics and animation in the physical world

    1:08:01 - Poem - Cast My Ashes Wide and Far

    submitted by /u/UltraMarathonMan
    [link] [comments]

    Source link

    Click here to read more

    How to Develop a Face Recognition System Using FaceNet in Keras

    Face recognition is a computer vision task of identifying and verifying a person based on a photograph of their face.

    FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets. The FaceNet system can be used broadly thanks to multiple third-party open source implementations of the model and the availability of pre-trained models.

    The FaceNet system can be used to extract high-quality features from faces, called face embeddings, that can then be used to train a face identification system.

    In this tutorial, you will discover how to develop a face detection system using FaceNet and an SVM classifier to identify people from photographs.

    After completing this tutorial, you will know:

    • About the FaceNet face recognition system developed by Google and open source implementations and pre-trained models.
    • How to prepare a face detection dataset including first extracting faces via a face detection system and then extracting face features via face embeddings.
    • How to fit, evaluate, and demonstrate an SVM model to predict identities from faces embeddings.

    Let’s get started.

    How to Develop a Face Recognition System Using FaceNet in Keras and an SVM Classifier

    How to Develop a Face Recognition System Using FaceNet in Keras and an SVM Classifier
    Photo by Peter Valverde, some rights reserved.

    Tutorial Overview

    This tutorial is divided into five parts; they are:

    1. Face Recognition
    2. FaceNet Model
    3. How to Load a FaceNet Model in Keras
    4. How to Detect Faces for Face Recognition
    5. How to Develop a Face Classification System

    Face Recognition

    Face recognition is the general task of identifying and verifying people from photographs of their face.

    The 2011 book on face recognition titled “Handbook of Face Recognition” describes two main modes for face recognition, as:

    • Face Verification. A one-to-one mapping of a given face against a known identity (e.g. is this the person?).
    • Face Identification. A one-to-many mapping for a given face against a database of known faces (e.g. who is this person?).

    A face recognition system is expected to identify faces present in images and videos automatically. It can operate in either or both of two modes: (1) face verification (or authentication), and (2) face identification (or recognition).

    — Page 1, Handbook of Face Recognition. 2011.

    We will focus on the face identification task in this tutorial.

    Want Results with Deep Learning for Computer Vision?

    Take my free 7-day email crash course now (with sample code).

    Click to sign-up and also get a free PDF Ebook version of the course.

    Download Your FREE Mini-Coursehttps://machinelearningmastery.lpages.co/leadbox-1553357564.js

    FaceNet Model

    FaceNet is a face recognition system that was described by Florian Schroff, et al. at Google in their 2015 paper titled “FaceNet: A Unified Embedding for Face Recognition and Clustering.”

    It is a system that, given a picture of a face, will extract high-quality features from the face and predict a 128 element vector representation these features, called a face embedding.

    FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.

    — FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

    The model is a deep convolutional neural network trained via a triplet loss function that encourages vectors for the same identity to become more similar (smaller distance), whereas vectors for different identities are expected to become less similar (larger distance). The focus on training a model to create embeddings directly (rather than extracting them from an intermediate layer of a model) was an important innovation in this work.

    Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches.

    — FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

    These face embeddings were then used as the basis for training classifier systems on standard face recognition benchmark datasets, achieving then-state-of-the-art results.

    Our system cuts the error rate in comparison to the best published result by 30% …

    — FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

    The paper also explores other uses of the embeddings, such as clustering to group like-faces based on their extracted features.

    It is a robust and effective face recognition system, and the general nature of the extracted face embeddings lends the approach to a range of applications.

    How to Load a FaceNet Model in Keras

    There are a number of projects that provide tools to train FaceNet-based models and make use of pre-trained models.

    Perhaps the most prominent is called OpenFace that provides FaceNet models built and trained using the PyTorch deep learning framework. There is a port of OpenFace to Keras, called Keras OpenFace, but at the time of writing, the models appear to require Python 2, which is quite limiting.

    Another prominent project is called FaceNet by David Sandberg that provides FaceNet models built and trained using TensorFlow. The project looks mature, although at the time of writing does not provide a library-based installation nor clean API. Usefully, David’s project provides a number of high-performing pre-trained FaceNet models and there are a number of projects that port or convert these models for use in Keras.

    A notable example is Keras FaceNet by Hiroki Taniai. His project provides a script for converting the Inception ResNet v1 model from TensorFlow to Keras. He also provides a pre-trained Keras model ready for use.

    We will use the pre-trained Keras FaceNet model provided by Hiroki Taniai in this tutorial. It was trained on MS-Celeb-1M dataset and expects input images to be color, to have their pixel values whitened (standardized across all three channels), and to have a square shape of 160×160 pixels.

    The model can be downloaded from here:

    • Keras FaceNet Pre-Trained Model (88 megabytes)

    Download the model file and place it in your current working directory with the filename ‘facenet_keras.h5‘.

    We can load the model directly in Keras using the load_model() function; for example:

    # example of loading the keras facenet model
    from keras.models import load_model
    # load the model
    model = load_model('facenet_keras.h5')
    # summarize input and output shape
    print(model.inputs)
    print(model.outputs)

    Running the example loads the model and prints the shape of the input and output tensors.

    We can see that the model indeed expects square color images as input with the shape 160×160, and will output a face embedding as a 128 element vector.

    # [<tf.Tensor 'input_1:0' shape=(?, 160, 160, 3) dtype=float32>]
    # [<tf.Tensor 'Bottleneck_BatchNorm/cond/Merge:0' shape=(?, 128) dtype=float32>]

    Now that we have a FaceNet model, we can explore using it.

    How to Detect Faces for Face Recognition

    Before we can perform face recognition, we need to detect faces.

    Face detection is the process of automatically locating faces in a photograph and localizing them by drawing a bounding box around their extent.

    In this tutorial, we will also use the Multi-Task Cascaded Convolutional Neural Network, or MTCNN, for face detection, e.g. finding and extracting faces from photos. This is a state-of-the-art deep learning model for face detection, described in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.”

    We will use the implementation provided by IvΓ‘n de Paz Centeno in the ipazc/mtcnn project. This can also be installed via pip as follows:

    sudo pip install mtcnn

    We can confirm that the library was installed correctly by importing the library and printing the version; for example:

    # confirm mtcnn was installed correctly
    import mtcnn
    # print version
    print(mtcnn.__version__)

    Running the example prints the current version of the library.

    0.0.8

    We can use the mtcnn library to create a face detector and extract faces for our use with the FaceNet face detector models in subsequent sections.

    The first step is to load an image as a NumPy array, which we can achieve using the PIL library and the open() function. We will also convert the image to RGB, just in case the image has an alpha channel or is black and white.

    # load image from file
    image = Image.open(filename)
    # convert to RGB, if needed
    image = image.convert('RGB')
    # convert to array
    pixels = asarray(image)

    Next, we can create an MTCNN face detector class and use it to detect all faces in the loaded photograph.

    # create the detector, using default weights
    detector = MTCNN()
    # detect faces in the image
    results = detector.detect_faces(pixels)

    The result is a list of bounding boxes, where each bounding box defines a lower-left-corner of the bounding box, as well as the width and height.

    If we assume there is only one face in the photo for our experiments, we can determine the pixel coordinates of the bounding box as follows. Sometimes the library will return a negative pixel index, and I think this is a bug. We can fix this by taking the absolute value of the coordinates.

    # extract the bounding box from the first face
    x1, y1, width, height = results[0]['box']
    # bug fix
    x1, y1 = abs(x1), abs(y1)
    x2, y2 = x1 + width, y1 + height

    We can use these coordinates to extract the face.

    # extract the face
    face = pixels[y1:y2, x1:x2]

    We can then use the PIL library to resize this small image of the face to the required size; specifically, the model expects square input faces with the shape 160×160.

    # resize pixels to the model size
    image = Image.fromarray(face)
    image = image.resize((160, 160))
    face_array = asarray(image)

    Tying all of this together, the function extract_face() will load a photograph from the loaded filename and return the extracted face. It assumes that the photo contains one face and will return the first face detected.

    # function for face detection with mtcnn
    from PIL import Image
    from numpy import asarray
    from mtcnn.mtcnn import MTCNN
    
    # extract a single face from a given photograph
    def extract_face(filename, required_size=(160, 160)):
    	# load image from file
    	image = Image.open(filename)
    	# convert to RGB, if needed
    	image = image.convert('RGB')
    	# convert to array
    	pixels = asarray(image)
    	# create the detector, using default weights
    	detector = MTCNN()
    	# detect faces in the image
    	results = detector.detect_faces(pixels)
    	# extract the bounding box from the first face
    	x1, y1, width, height = results[0]['box']
    	# bug fix
    	x1, y1 = abs(x1), abs(y1)
    	x2, y2 = x1 + width, y1 + height
    	# extract the face
    	face = pixels[y1:y2, x1:x2]
    	# resize pixels to the model size
    	image = Image.fromarray(face)
    	image = image.resize(required_size)
    	face_array = asarray(image)
    	return face_array
    
    # load the photo and extract the face
    pixels = extract_face('...')

    We can use this function to extract faces as needed in the next section that can be provided as input to the FaceNet model.

    How to Develop a Face Classification System

    In this section, we will develop a face detection system to predict the identity of a given face.

    The model will be trained and tested using the ‘5 Celebrity Faces Dataset‘ that contains many photographs of five different celebrities.

    We will use an MTCNN model for face detection, the FaceNet model will be used to create a face embedding for each detected face, then we will develop a Linear Support Vector Machine (SVM) classifier model to predict the identity of a given face.

    5 Celebrity Faces Dataset

    The 5 Celebrity Faces Dataset is a small dataset that contains photographs of celebrities.

    It includes photos of: Ben Affleck, Elton John, Jerry Seinfeld, Madonna, and Mindy Kaling.

    The dataset was prepared and made available by Dan Becker and provided for free download on Kaggle. Note, a Kaggle account is required to download the dataset.

    • 5 Celebrity Faces Dataset, Kaggle.

    Download the dataset (this may require a Kaggle login), data.zip (2.5 megabytes), and unzip it in your local directory with the folder name ‘5-celebrity-faces-dataset‘.

    You should now have a directory with the following structure (note, there are spelling mistakes in some directory names, and they were left as-is in this example):

    5-celebrity-faces-dataset
    β”œβ”€β”€ train
    β”‚   β”œβ”€β”€ ben_afflek
    β”‚   β”œβ”€β”€ elton_john
    β”‚   β”œβ”€β”€ jerry_seinfeld
    β”‚   β”œβ”€β”€ madonna
    β”‚   └── mindy_kaling
    └── val
        β”œβ”€β”€ ben_afflek
        β”œβ”€β”€ elton_john
        β”œβ”€β”€ jerry_seinfeld
        β”œβ”€β”€ madonna
        └── mindy_kaling

    We can see that there is a training dataset and a validation or test dataset.

    Looking at some of the photos in the directories, we can see that the photos provide faces with a range of orientations, lighting, and in various sizes. Importantly, each photo contains one face of the person.

    We will use this dataset as the basis for our classifier, trained on the ‘train‘ dataset only and classify faces in the ‘val‘ dataset. You can use this same structure to develop a classifier with your own photographs.

    Detect Faces

    The first step is to detect the face in each photograph and reduce the dataset to a series of faces only.

    Let’s test out our face detector function defined in the previous section, specifically extract_face().

    Looking in the ‘5-celebrity-faces-dataset/train/ben_afflek/‘ directory, we can see that there are 14 photographs of Ben Affleck in the training dataset. We can detect the face in each photograph, and create a plot with 14 faces, with two rows of seven images each.

    The complete example is listed below.

    # demonstrate face detection on 5 Celebrity Faces Dataset
    from os import listdir
    from PIL import Image
    from numpy import asarray
    from matplotlib import pyplot
    from mtcnn.mtcnn import MTCNN
    
    # extract a single face from a given photograph
    def extract_face(filename, required_size=(160, 160)):
    	# load image from file
    	image = Image.open(filename)
    	# convert to RGB, if needed
    	image = image.convert('RGB')
    	# convert to array
    	pixels = asarray(image)
    	# create the detector, using default weights
    	detector = MTCNN()
    	# detect faces in the image
    	results = detector.detect_faces(pixels)
    	# extract the bounding box from the first face
    	x1, y1, width, height = results[0]['box']
    	# bug fix
    	x1, y1 = abs(x1), abs(y1)
    	x2, y2 = x1 + width, y1 + height
    	# extract the face
    	face = pixels[y1:y2, x1:x2]
    	# resize pixels to the model size
    	image = Image.fromarray(face)
    	image = image.resize(required_size)
    	face_array = asarray(image)
    	return face_array
    
    # specify folder to plot
    folder = '5-celebrity-faces-dataset/train/ben_afflek/'
    i = 1
    # enumerate files
    for filename in listdir(folder):
    	# path
    	path = folder + filename
    	# get face
    	face = extract_face(path)
    	print(i, face.shape)
    	# plot
    	pyplot.subplot(2, 7, i)
    	pyplot.axis('off')
    	pyplot.imshow(face)
    	i += 1
    pyplot.show()

    Running the example takes a moment and reports the progress of each loaded photograph along the way and the shape of the NumPy array containing the face pixel data.

    1 (160, 160, 3)
    2 (160, 160, 3)
    3 (160, 160, 3)
    4 (160, 160, 3)
    5 (160, 160, 3)
    6 (160, 160, 3)
    7 (160, 160, 3)
    8 (160, 160, 3)
    9 (160, 160, 3)
    10 (160, 160, 3)
    11 (160, 160, 3)
    12 (160, 160, 3)
    13 (160, 160, 3)
    14 (160, 160, 3)

    A figure is created containing the faces detected in the Ben Affleck directory.

    We can see that each face was correctly detected and that we have a range of lighting, skin tones, and orientations in the detected faces.

    Plot of 14 Faces of Ben Affleck Detected From the Training Dataset of the 5 Celebrity Faces Dataset

    Plot of 14 Faces of Ben Affleck Detected From the Training Dataset of the 5 Celebrity Faces Dataset

    So far, so good.

    Next, we can extend this example to step over each subdirectory for a given dataset (e.g. ‘train‘ or ‘val‘), extract the faces, and prepare a dataset with the name as the output label for each detected face.

    The load_faces() function below will load all of the faces into a list for a given directory, e.g. ‘5-celebrity-faces-dataset/train/ben_afflek/‘.

    # load images and extract faces for all images in a directory
    def load_faces(directory):
    	faces = list()
    	# enumerate files
    	for filename in listdir(directory):
    		# path
    		path = directory + filename
    		# get face
    		face = extract_face(path)
    		# store
    		faces.append(face)
    	return faces

    We can call the load_faces() function for each subdirectory in the ‘train‘ or ‘val‘ folders. Each face has one label, the name of the celebrity, which we can take from the directory name.

    The load_dataset() function below takes a directory name such as ‘5-celebrity-faces-dataset/train/‘ and detects faces for each subdirectory (celebrity), assigning labels to each detected face.

    It returns the X and y elements of the dataset as NumPy arrays.

    # load a dataset that contains one subdir for each class that in turn contains images
    def load_dataset(directory):
    	X, y = list(), list()
    	# enumerate folders, on per class
    	for subdir in listdir(directory):
    		# path
    		path = directory + subdir + '/'
    		# skip any files that might be in the dir
    		if not isdir(path):
    			continue
    		# load all faces in the subdirectory
    		faces = load_faces(path)
    		# create labels
    		labels = [subdir for _ in range(len(faces))]
    		# summarize progress
    		print('>loaded %d examples for class: %s' % (len(faces), subdir))
    		# store
    		X.extend(faces)
    		y.extend(labels)
    	return asarray(X), asarray(y)

    We can then call this function for the ‘train’ and ‘val’ folders to load all of the data, then save the results in a single compressed NumPy array file via the savez_compressed() function.

    # load train dataset
    trainX, trainy = load_dataset('5-celebrity-faces-dataset/train/')
    print(trainX.shape, trainy.shape)
    # load test dataset
    testX, testy = load_dataset('5-celebrity-faces-dataset/val/')
    print(testX.shape, testy.shape)
    # save arrays to one file in compressed format
    savez_compressed('5-celebrity-faces-dataset.npz', trainX, trainy, testX, testy)

    Tying all of this together, the complete example of detecting all of the faces in the 5 Celebrity Faces Dataset is listed below.

    # face detection for the 5 Celebrity Faces Dataset
    from os import listdir
    from os.path import isdir
    from PIL import Image
    from matplotlib import pyplot
    from numpy import savez_compressed
    from numpy import asarray
    from mtcnn.mtcnn import MTCNN
    
    # extract a single face from a given photograph
    def extract_face(filename, required_size=(160, 160)):
    	# load image from file
    	image = Image.open(filename)
    	# convert to RGB, if needed
    	image = image.convert('RGB')
    	# convert to array
    	pixels = asarray(image)
    	# create the detector, using default weights
    	detector = MTCNN()
    	# detect faces in the image
    	results = detector.detect_faces(pixels)
    	# extract the bounding box from the first face
    	x1, y1, width, height = results[0]['box']
    	# bug fix
    	x1, y1 = abs(x1), abs(y1)
    	x2, y2 = x1 + width, y1 + height
    	# extract the face
    	face = pixels[y1:y2, x1:x2]
    	# resize pixels to the model size
    	image = Image.fromarray(face)
    	image = image.resize(required_size)
    	face_array = asarray(image)
    	return face_array
    
    # load images and extract faces for all images in a directory
    def load_faces(directory):
    	faces = list()
    	# enumerate files
    	for filename in listdir(directory):
    		# path
    		path = directory + filename
    		# get face
    		face = extract_face(path)
    		# store
    		faces.append(face)
    	return faces
    
    # load a dataset that contains one subdir for each class that in turn contains images
    def load_dataset(directory):
    	X, y = list(), list()
    	# enumerate folders, on per class
    	for subdir in listdir(directory):
    		# path
    		path = directory + subdir + '/'
    		# skip any files that might be in the dir
    		if not isdir(path):
    			continue
    		# load all faces in the subdirectory
    		faces = load_faces(path)
    		# create labels
    		labels = [subdir for _ in range(len(faces))]
    		# summarize progress
    		print('>loaded %d examples for class: %s' % (len(faces), subdir))
    		# store
    		X.extend(faces)
    		y.extend(labels)
    	return asarray(X), asarray(y)
    
    # load train dataset
    trainX, trainy = load_dataset('5-celebrity-faces-dataset/train/')
    print(trainX.shape, trainy.shape)
    # load test dataset
    testX, testy = load_dataset('5-celebrity-faces-dataset/val/')
    # save arrays to one file in compressed format
    savez_compressed('5-celebrity-faces-dataset.npz', trainX, trainy, testX, testy)

    Running the example may take a moment.

    First, all of the photos in the ‘train‘ dataset are loaded, then faces are extracted, resulting in 93 samples with square face input and a class label string as output. Then the ‘val‘ dataset is loaded, providing 25 samples that can be used as a test dataset.

    Both datasets are then saved to a compressed NumPy array file called ‘5-celebrity-faces-dataset.npz‘ that is about three megabytes and is stored in the current working directory.

    >loaded 14 examples for class: ben_afflek
    >loaded 19 examples for class: madonna
    >loaded 17 examples for class: elton_john
    >loaded 22 examples for class: mindy_kaling
    >loaded 21 examples for class: jerry_seinfeld
    (93, 160, 160, 3) (93,)
    >loaded 5 examples for class: ben_afflek
    >loaded 5 examples for class: madonna
    >loaded 5 examples for class: elton_john
    >loaded 5 examples for class: mindy_kaling
    >loaded 5 examples for class: jerry_seinfeld
    (25, 160, 160, 3) (25,)

    This dataset is ready to be provided to a face detection model.

    Create Face Embeddings

    The next step is to create a face embedding.

    A face embedding is a vector that represents the features extracted from the face. This can then be compared with the vectors generated for other faces. For example, another vector that is close (by some measure) may be the same person, whereas another vector that is far (by some measure) may be a different person.

    The classifier model that we want to develop will take a face embedding as input and predict the identity of the face. The FaceNet model will generate this embedding for a given image of a face.

    The FaceNet model can be used as part of the classifier itself, or we can use the FaceNet model to pre-process a face to create a face embedding that can be stored and used as input to our classifier model. This latter approach is preferred as the FaceNet model is both large and slow to create a face embedding.

    We can, therefore, pre-compute the face embeddings for all faces in the train and test (formally ‘val‘) sets in our 5 Celebrity Faces Dataset.

    First, we can load our detected faces dataset using the load() NumPy function.

    # load the face dataset
    data = load('5-celebrity-faces-dataset.npz')
    trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
    print('Loaded: ', trainX.shape, trainy.shape, testX.shape, testy.shape)

    Next, we can load our FaceNet model ready for converting faces into face embeddings.

    # load the facenet model
    model = load_model('facenet_keras.h5')
    print('Loaded Model')

    We can then enumerate each face in the train and test datasets to predict an embedding.

    To predict an embedding, first the pixel values of the image need to be suitably prepared to meet the expectations of the FaceNet model. This specific implementation of the FaceNet model expects that the pixel values are standardized.

    # scale pixel values
    face_pixels = face_pixels.astype('float32')
    # standardize pixel values across channels (global)
    mean, std = face_pixels.mean(), face_pixels.std()
    face_pixels = (face_pixels - mean) / std

    In order to make a prediction for one example in Keras, we must expand the dimensions so that the face array is one sample.

    # transform face into one sample
    samples = expand_dims(face_pixels, axis=0)

    We can then use the model to make a prediction and extract the embedding vector.

    # make prediction to get embedding
    yhat = model.predict(samples)
    # get embedding
    embedding = yhat[0]

    The get_embedding() function defined below implements these behaviors and will return a face embedding given a single image of a face and the loaded FaceNet model.

    # get the face embedding for one face
    def get_embedding(model, face_pixels):
    	# scale pixel values
    	face_pixels = face_pixels.astype('float32')
    	# standardize pixel values across channels (global)
    	mean, std = face_pixels.mean(), face_pixels.std()
    	face_pixels = (face_pixels - mean) / std
    	# transform face into one sample
    	samples = expand_dims(face_pixels, axis=0)
    	# make prediction to get embedding
    	yhat = model.predict(samples)
    	return yhat[0]

    Tying all of this together, the complete example of converting each face into a face embedding in the train and test datasets is listed below.

    # calculate a face embedding for each face in the dataset using facenet
    from numpy import load
    from numpy import expand_dims
    from numpy import asarray
    from numpy import savez_compressed
    from keras.models import load_model
    
    # get the face embedding for one face
    def get_embedding(model, face_pixels):
    	# scale pixel values
    	face_pixels = face_pixels.astype('float32')
    	# standardize pixel values across channels (global)
    	mean, std = face_pixels.mean(), face_pixels.std()
    	face_pixels = (face_pixels - mean) / std
    	# transform face into one sample
    	samples = expand_dims(face_pixels, axis=0)
    	# make prediction to get embedding
    	yhat = model.predict(samples)
    	return yhat[0]
    
    # load the face dataset
    data = load('5-celebrity-faces-dataset.npz')
    trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
    print('Loaded: ', trainX.shape, trainy.shape, testX.shape, testy.shape)
    # load the facenet model
    model = load_model('facenet_keras.h5')
    print('Loaded Model')
    # convert each face in the train set to an embedding
    newTrainX = list()
    for face_pixels in trainX:
    	embedding = get_embedding(model, face_pixels)
    	newTrainX.append(embedding)
    newTrainX = asarray(newTrainX)
    print(newTrainX.shape)
    # convert each face in the test set to an embedding
    newTestX = list()
    for face_pixels in testX:
    	embedding = get_embedding(model, face_pixels)
    	newTestX.append(embedding)
    newTestX = asarray(newTestX)
    print(newTestX.shape)
    # save arrays to one file in compressed format
    savez_compressed('5-celebrity-faces-embeddings.npz', newTrainX, trainy, newTestX, testy)

    Running the example reports progress along the way.

    We can see that the face dataset was loaded correctly and so was the model. The train dataset was then transformed into 93 face embeddings, each comprised of a 128 element vector. The 25 examples in the test dataset were also suitably converted to face embeddings.

    The resulting datasets were then saved to a compressed NumPy array that is about 50 kilobytes with the name ‘5-celebrity-faces-embeddings.npz‘ in the current working directory.

    Loaded:  (93, 160, 160, 3) (93,) (25, 160, 160, 3) (25,)
    Loaded Model
    (93, 128)
    (25, 128)

    We are now ready to develop our face classifier system.

    Perform Face Classification

    In this section, we will develop a model to classify face embeddings as one of the known celebrities in the 5 Celebrity Faces Dataset.

    First, we must load the face embeddings dataset.

    # load dataset
    data = load('5-celebrity-faces-embeddings.npz')
    trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
    print('Dataset: train=%d, test=%d' % (trainX.shape[0], testX.shape[0]))

    Next, the data requires some minor preparation prior to modeling.

    First, it is a good practice to normalize the face embedding vectors. It is a good practice because the vectors are often compared to each other using a distance metric.

    In this context, vector normalization means scaling the values until the length or magnitude of the vectors is 1 or unit length. This can be achieved using the Normalizer class in scikit-learn. It might even be more convenient to perform this step when the face embeddings are created in the previous step.

    # normalize input vectors
    in_encoder = Normalizer(norm='l2')
    trainX = in_encoder.transform(trainX)
    testX = in_encoder.transform(testX)

    Next, the string target variables for each celebrity name need to be converted to integers.

    This can be achieved via the LabelEncoder class in scikit-learn.

    # label encode targets
    out_encoder = LabelEncoder()
    out_encoder.fit(trainy)
    trainy = out_encoder.transform(trainy)
    testy = out_encoder.transform(testy)

    Next, we can fit a model.

    It is common to use a Linear Support Vector Machine (SVM) when working with normalized face embedding inputs. This is because the method is very effective at separating the face embedding vectors. We can fit a linear SVM to the training data using the SVC class in scikit-learn and setting the ‘kernel‘ attribute to ‘linear‘. We may also want probabilities later when making predictions, which can be configured by setting ‘probability‘ to ‘True‘.

    # fit model
    model = SVC(kernel='linear')
    model.fit(trainX, trainy)

    Next, we can evaluate the model.

    This can be achieved by using the fit model to make a prediction for each example in the train and test datasets and then calculating the classification accuracy.

    # predict
    yhat_train = model.predict(trainX)
    yhat_test = model.predict(testX)
    # score
    score_train = accuracy_score(trainy, yhat_train)
    score_test = accuracy_score(testy, yhat_test)
    # summarize
    print('Accuracy: train=%.3f, test=%.3f' % (score_train*100, score_test*100))

    Tying all of this together, the complete example of fitting a Linear SVM on the face embeddings for the 5 Celebrity Faces Dataset is listed below.

    # develop a classifier for the 5 Celebrity Faces Dataset
    from numpy import load
    from sklearn.metrics import accuracy_score
    from sklearn.preprocessing import LabelEncoder
    from sklearn.preprocessing import Normalizer
    from sklearn.svm import SVC
    # load dataset
    data = load('5-celebrity-faces-embeddings.npz')
    trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
    print('Dataset: train=%d, test=%d' % (trainX.shape[0], testX.shape[0]))
    # normalize input vectors
    in_encoder = Normalizer(norm='l2')
    trainX = in_encoder.transform(trainX)
    testX = in_encoder.transform(testX)
    # label encode targets
    out_encoder = LabelEncoder()
    out_encoder.fit(trainy)
    trainy = out_encoder.transform(trainy)
    testy = out_encoder.transform(testy)
    # fit model
    model = SVC(kernel='linear', probability=True)
    model.fit(trainX, trainy)
    # predict
    yhat_train = model.predict(trainX)
    yhat_test = model.predict(testX)
    # score
    score_train = accuracy_score(trainy, yhat_train)
    score_test = accuracy_score(testy, yhat_test)
    # summarize
    print('Accuracy: train=%.3f, test=%.3f' % (score_train*100, score_test*100))

    Running the example first confirms that the number of samples in the train and test datasets is as we expect

    Next, the model is evaluated on the train and test dataset, showing perfect classification accuracy. This is not surprising given the size of the dataset and the power of the face detection and face recognition models used.

    Dataset: train=93, test=25
    Accuracy: train=100.000, test=100.000

    We can make it more interesting by plotting the original face and the prediction.

    First, we need to load the face dataset, specifically the faces in the test dataset. We could also load the original photos to make it even more interesting.

    # load faces
    data = load('5-celebrity-faces-dataset.npz')
    testX_faces = data['arr_2']

    The rest of the example is the same up until we fit the model.

    First, we need to select a random example from the test set, then get the embedding, face pixels, expected class prediction, and the corresponding name for the class.

    # test model on a random example from the test dataset
    selection = choice([i for i in range(testX.shape[0])])
    random_face_pixels = testX_faces[selection]
    random_face_emb = testX[selection]
    random_face_class = testy[selection]
    random_face_name = out_encoder.inverse_transform([random_face_class])

    Next, we can use the face embedding as an input to make a single prediction with the fit model.

    We can predict both the class integer and the probability of the prediction.

    # prediction for the face
    samples = expand_dims(random_face_emb, axis=0)
    yhat_class = model.predict(samples)
    yhat_prob = model.predict_proba(samples)

    We can then get the name for the predicted class integer, and the probability for this prediction.

    # get name
    class_index = yhat_class[0]
    class_probability = yhat_prob[0,class_index] * 100
    predict_names = out_encoder.inverse_transform(yhat_class)

    We can then print this information.

    print('Predicted: %s (%.3f)' % (predict_names[0], class_probability))
    print('Expected: %s' % random_face_name[0])

    We can also plot the face pixels along with the predicted name and probability.

    # plot for fun
    pyplot.imshow(random_face_pixels)
    title = '%s (%.3f)' % (predict_names[0], class_probability)
    pyplot.title(title)
    pyplot.show()

    Tying all of this together, the complete example for predicting the identity for a given unseen photo in the test dataset is listed below.

    # develop a classifier for the 5 Celebrity Faces Dataset
    from random import choice
    from numpy import load
    from numpy import expand_dims
    from sklearn.preprocessing import LabelEncoder
    from sklearn.preprocessing import Normalizer
    from sklearn.svm import SVC
    from matplotlib import pyplot
    # load faces
    data = load('5-celebrity-faces-dataset.npz')
    testX_faces = data['arr_2']
    # load face embeddings
    data = load('5-celebrity-faces-embeddings.npz')
    trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
    # normalize input vectors
    in_encoder = Normalizer(norm='l2')
    trainX = in_encoder.transform(trainX)
    testX = in_encoder.transform(testX)
    # label encode targets
    out_encoder = LabelEncoder()
    out_encoder.fit(trainy)
    trainy = out_encoder.transform(trainy)
    testy = out_encoder.transform(testy)
    # fit model
    model = SVC(kernel='linear', probability=True)
    model.fit(trainX, trainy)
    # test model on a random example from the test dataset
    selection = choice([i for i in range(testX.shape[0])])
    random_face_pixels = testX_faces[selection]
    random_face_emb = testX[selection]
    random_face_class = testy[selection]
    random_face_name = out_encoder.inverse_transform([random_face_class])
    # prediction for the face
    samples = expand_dims(random_face_emb, axis=0)
    yhat_class = model.predict(samples)
    yhat_prob = model.predict_proba(samples)
    # get name
    class_index = yhat_class[0]
    class_probability = yhat_prob[0,class_index] * 100
    predict_names = out_encoder.inverse_transform(yhat_class)
    print('Predicted: %s (%.3f)' % (predict_names[0], class_probability))
    print('Expected: %s' % random_face_name[0])
    # plot for fun
    pyplot.imshow(random_face_pixels)
    title = '%s (%.3f)' % (predict_names[0], class_probability)
    pyplot.title(title)
    pyplot.show()

    A different random example from the test dataset will be selected each time the code is run.

    Try running it a few times.

    In this case, a photo of Jerry Seinfeld is selected and correctly predicted.

    Predicted: jerry_seinfeld (88.476)
    Expected: jerry_seinfeld

    A plot of the chosen face is also created, showing the predicted name and probability in the image title.

    Detected Face of Jerry Seinfeld, Correctly Identified by the SVM Classifier

    Detected Face of Jerry Seinfeld, Correctly Identified by the SVM Classifier

    Further Reading

    This section provides more resources on the topic if you are looking to go deeper.

    Papers

    • FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

    Books

    • Handbook of Face Recognition,Β 2011.

    Projects

    • OpenFace PyTorch Project.
    • OpenFace Keras Project, GitHub.
    • Keras FaceNet Project, GitHub.
    • MS-Celeb 1M Dataset.

    APIs

    • sklearn.preprocessing.Normalizer API
    • sklearn.preprocessing.LabelEncoder API
    • sklearn.svm.SVC API

    Summary

    In this tutorial, you discovered how to develop a face detection system using FaceNet and an SVM classifier to identify people from photographs.

    Specifically, you learned:

    • About the FaceNet face recognition system developed by Google and open source implementations and pre-trained models.
    • How to prepare a face detection dataset including first extracting faces via a face detection system and then extracting face features via face embeddings.
    • How to fit, evaluate, and demonstrate an SVM model to predict identities from faces embeddings.

    Do you have any questions?
    Ask your questions in the comments below and I will do my best to answer.

    The post How to Develop a Face Recognition System Using FaceNet in Keras appeared first on Machine Learning Mastery.

    Source link

    Click here to read more

    [P] Create deep learning models with flowpoints

    [P] Create deep learning models with flowpoints

    https://i.redd.it/my2ek0j7pr231.png

    Flowpoints makes it possible to create deep learning models in a flowchart kind of manner.

    Simply create some nodes, connect them however you like, and copy the automatically written code! Models can be created with either TensorFlow or PyTorch.

    With link sharing it's easy to share models with others, and with a graphical representation of your model it becomes much easier to explain your machine learning model to pretty much anyone:)

    Check out the readme or this medium post for more info.

    https://i.redd.it/rub1btzrqr231.png

    To begin with, I created this tool for my own use. Soon after, I started using it a whole lot for keeping track of model architectures, explaining to project managers and friends how the model worked, and it enabled me to create models waay quicker than I had before.

    Now I hope it can be useful for others as well:)

    I've open-sourced this project, and would love some help maintaining the code or adding functionality!

    submitted by /u/mariusbrataas
    [link] [comments]

    Source link

    Click here to read more

    [D] Handling Lag Features for different time frames.

    [D] Handling Lag Features for different time frames.

    Hi,

    I'm currently working on a project which involves a sort of time series problem which I transformed to a classification problem for more detailed prediction, i.e. rather than having an aggregated figure at the end of the day in the time series modeling of the problem, I rather classify single instances which eventually depict the figure of the time series modeling when aggregated.

    To summarize, the problem setting is actually a scheduling problem where an employee is assigned to a shift and the prediction is whether employees will be absent or not for the respective, scheduled shift.

    Anyway, I try to train two different models which should be used at two different ponts in time. One is basically a 24h model which should predict instances scheduled for the next day and one model which should predict the very same instances one week beforehand. Below, I tried to illustrate the problem on a time line, hope this helps.

    https://i.redd.it/pb1f5pqppc231.png

    I started with the former model which seems a bit easier, as all information that can be available, are available at prediction point for this model. I did some feature engineering which mostly includes lag features that is over the last instances that are recorded. Since the lag features seems to contribute quite well to the model's performance, I actually wanted to re-use them in the 'one-week' model. However, I face the problem that I don't know how to calculate them accurately (if that makes even sense in this case).

    As you can see in the second time line, between the prediction point and the time where the instance is scheduled I'd like to predict, there's a gap of a week where there could potentially be more scheduled instances. I'm not sure how to deal with it. If I was ignoring the gap week completely and keep on calculating the lag features in the same sense as in the 24h model, I feel that this will not work out quite well (although I haven't tried it yet).

    Unfortunately, I couldn't find any literature on this problem or some sort of kaggle competition where this problem was also faced. Therefore, I don't have any ideas how to handle it and would appreciate any kind of ideas from you guys.

    Thanks very much!

    submitted by /u/babuunn
    [link] [comments]

    Source link

    Click here to read more

    [P] Simple Tensorflow implementation of GauGAN (SPADE, CVPR 2019 Oral)

    [P] Simple Tensorflow implementation of GauGAN (SPADE, CVPR 2019 Oral)

    Style Manipulation of women

    Style Manipulation of men

    submitted by /u/taki0112
    [link] [comments]

    Source link

    Click here to read more

    [P] 1 million AI generated fake faces for download

    [P] 1 million AI generated fake faces for download

    I generated 1 million faces with NVIDIA's StyleGAN and released them under the same CC BY-NC 4.0 license for free download on archive. org

    Direct link here

    Original tweet

    A few examples

    submitted by /u/shoeblade
    [link] [comments]

    Source link

    Click here to read more

    How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)

    Object detection is a challenging computer vision task that involves predicting both where the objects are in the image and what type of objects were detected.

    The Mask Region-based Convolutional Neural Network, or Mask R-CNN, model is one of the state-of-the-art approaches for object recognition tasks. The Matterport Mask R-CNN project provides a library that allows you to develop and train Mask R-CNN Keras models for your own object detection tasks. Using the library can be tricky for beginners and requires the careful preparation of the dataset, although it allows fast training via transfer learning with top performing models trained on challenging object detection tasks, such as MS COCO.

    In this tutorial, you will discover how to develop a Mask R-CNN model for kangaroo object detection in photographs.

    After completing this tutorial, you will know:

    • How to prepare an object detection dataset ready for modeling with an R-CNN.
    • How to use transfer learning to train an object detection model on a new dataset.
    • How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.

    Let’s get started.

    How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)

    How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)
    Photo by Ronnie Robertson, some rights reserved.

    Tutorial Overview

    This tutorial is divided into five parts; they are:

    1. How to Install Mask R-CNN for Keras
    2. How to Prepare a Dataset for Object Detection
    3. How to a Train Mask R-CNN Model for Kangaroo Detection
    4. How to Evaluate a Mask R-CNN Model
    5. How to Detect Kangaroos in New Photos

    How to Install Mask R-CNN for Keras

    Object detection is a task in computer vision that involves identifying the presence, location, and type of one or more objects in a given image.

    It is a challenging problem that involves building upon methods for object recognition (e.g. where are they), object localization (e.g. what are their extent), and object classification (e.g. what are they).

    The Region-Based Convolutional Neural Network, or R-CNN, is a family of convolutional neural network models designed for object detection, developed by Ross Girshick, et al. There are perhaps four main variations of the approach, resulting in the current pinnacle called Mask R-CNN. The Mask R-CNN introduced in the 2018 paper titled “Mask R-CNN” is the most recent variation of the family of models and supports both object detection and object segmentation. Object segmentation not only involves localizing objects in the image but also specifies a mask for the image, indicating exactly which pixels in the image belong to the object.

    Mask R-CNN is a sophisticated model to implement, especially as compared to a simple or even state-of-the-art deep convolutional neural network model. Instead of developing an implementation of the R-CNN or Mask R-CNN model from scratch, we can use a reliable third-party implementation built on top of the Keras deep learning framework.

    The best-of-breed third-party implementations of Mask R-CNN is the Mask R-CNN Project developed by Matterport. The project is open source released under a permissive license (e.g. MIT license) and the code has been widely used on a variety of projects and Kaggle competitions.

    The first step is to install the library.

    At the time of writing, there is no distributed version of the library, so we have to install it manually. The good news is that this is very easy.

    Installation involves cloning the GitHub repository and running the installation script on your workstation. If you are having trouble, see the installation instructions buried in the library’s readme file.

    Want Results with Deep Learning for Computer Vision?

    Take my free 7-day email crash course now (with sample code).

    Click to sign-up and also get a free PDF Ebook version of the course.

    Download Your FREE Mini-Coursehttps://machinelearningmastery.lpages.co/leadbox-1553357564.js

    Step 1. Clone the Mask R-CNN GitHub Repository

    This is as simple as running the following command from your command line:

    git clone https://github.com/matterport/Mask_RCNN.git

    This will create a new local directory with the name Mask_RCNN that looks as follows:

    Mask_RCNN
    β”œβ”€β”€ assets
    β”œβ”€β”€ build
    β”‚   β”œβ”€β”€ bdist.macosx-10.13-x86_64
    β”‚   └── lib
    β”‚       └── mrcnn
    β”œβ”€β”€ dist
    β”œβ”€β”€ images
    β”œβ”€β”€ mask_rcnn.egg-info
    β”œβ”€β”€ mrcnn
    └── samples
        β”œβ”€β”€ balloon
        β”œβ”€β”€ coco
        β”œβ”€β”€ nucleus
        └── shapes

    Step 2. Install the Mask R-CNN Library

    The library can be installed directly via pip.

    Change directory into the Mask_RCNN directory and run the installation script.

    From the command line, type the following:

    cd Mask_RCNN
    python setup.py install

    On Linux or MacOS, you may need to install the software with sudo permissions; for example, you may see an error such as:

    error: can't create or remove files in install directory

    In that case, install the software with sudo:

    sudo python setup.py install

    If you are using a Python virtual environment (virtualenv), such as on an EC2 Deep Learning AMI instance (recommended for this tutorial), you can install Mask_RCNN into your environment as follows:

    sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

    The library will then install directly and you will see a lot of successful installation messages ending with the following:

    ...
    Finished processing dependencies for mask-rcnn==2.1

    This confirms that you installed the library successfully and that you have the latest version, which at the time of writing is version 2.1.

    Step 3: Confirm the Library Was Installed

    It is always a good idea to confirm that the library was installed correctly.

    You can confirm that the library was installed correctly by querying it via the pip command; for example:

    pip show mask-rcnn

    You should see output informing you of the version and installation location; for example:

    Name: mask-rcnn
    Version: 2.1
    Summary: Mask R-CNN for object detection and instance segmentation
    Home-page: https://github.com/matterport/Mask_RCNN
    Author: Matterport
    Author-email: waleed.abdulla@gmail.com
    License: MIT
    Location: ...
    Requires:
    Required-by:

    We are now ready to use the library.

    How to Prepare a Dataset for Object Detection

    Next, we need a dataset to model.

    In this tutorial, we will use the kangaroo dataset, made available by Huynh Ngoc Anh (experiencor). The dataset is comprised of 183 photographs that contain kangaroos, and XML annotation files that provide bounding boxes for the kangaroos in each photograph.

    The Mask R-CNN is designed to learn to predict both bounding boxes for objects as well as masks for those detected objects, and the kangaroo dataset does not provide masks. As such, we will use the dataset to learn a kangaroo object detection task, and ignore the masks and not focus on the image segmentation capabilities of the model.

    There are a few steps required in order to prepare this dataset for modeling and we will work through each in turn in this section, including downloading the dataset, parsing the annotations file, developing a KangarooDataset object that can be used by the Mask_RCNN library, then testing the dataset object to confirm that we are loading images and annotations correctly.

    Install Dataset

    The first step is to download the dataset into your current working directory.

    This can be achieved by cloning the GitHub repository directly, as follows:

    git clone https://github.com/experiencor/kangaroo.git

    This will create a new directory called “kangaroo” with a subdirectory called ‘images/‘ that contains all of the JPEG photos of kangaroos and a subdirectory called ‘annotes/‘ that contains all of the XML files that describe the locations of kangaroos in each photo.

    kangaroo
    β”œβ”€β”€ annots
    └── images

    Looking in each subdirectory, you can see that the photos and annotation files use a consistent naming convention, with filenames using a 5-digit zero-padded numbering system; for example:

    images/00001.jpg
    images/00002.jpg
    images/00003.jpg
    ...
    annots/00001.xml
    annots/00002.xml
    annots/00003.xml
    ...

    This makes matching photographs and annotation files together very easy.

    We can also see that the numbering system is not contiguous, that there are some photos missing, e.g. there is no ‘00007‘ JPG or XML.

    This means that we should focus on loading the list of actual files in the directory rather than using a numbering system.

    Parse Annotation File

    The next step is to figure out how to load the annotation files.

    First, open the first annotation file (annots/00001.xml) and take a look; you should see:

    <annotation>
    	<folder>Kangaroo</folder>
    	<filename>00001.jpg</filename>
    	<path>...</path>
    	<source>
    		<database>Unknown</database>
    	</source>
    	<size>
    		<width>450</width>
    		<height>319</height>
    		<depth>3</depth>
    	</size>
    	<segmented>0</segmented>
    	
    		kangaroo
    		Unspecified
    		0
    		0
    		
    			233
    			89
    			386
    			262
    		
    	
    	
    		kangaroo
    		Unspecified
    		0
    		0
    		
    			134
    			105
    			341
    			253
    		
    	
    </annotation>

    We can see that the annotation file contains a “size” element that describes the shape of the photograph, and one or more “object” elements that describe the bounding boxes for the kangaroo objects in the photograph.

    The size and the bounding boxes are the minimum information that we require from each annotation file. We could write some careful XML parsing code to process these annotation files, and that would be a good idea for a production system. Instead, we will short-cut development and use XPath queries to directly extract the data that we need from each file, e.g. a //size query to extract the size element and a //object or a //bndbox query to extract the bounding box elements.

    Python provides the ElementTree API that can be used to load and parse an XML file and we can use the find() and findall() functions to perform the XPath queries on a loaded document.

    First, the annotation file must be loaded and parsed as an ElementTree object.

    # load and parse the file
    tree = ElementTree.parse(filename)

    Once loaded, we can retrieve the root element of the document from which we can perform our XPath queries.

    # get the root of the document
    root = tree.getroot()

    We can use the findall() function with a query for ‘.//bndbox‘ to find all ‘bndbox‘ elements, then enumerate each to extract the x and y, min and max values that define each bounding box.

    The element text can also be parsed to integer values.

    # extract each bounding box
    for box in root.findall('.//bndbox'):
    	xmin = int(box.find('xmin').text)
    	ymin = int(box.find('ymin').text)
    	xmax = int(box.find('xmax').text)
    	ymax = int(box.find('ymax').text)
    	coors = [xmin, ymin, xmax, ymax]

    We can then collect the definition of each bounding box into a list.

    The dimensions of the image may also be helpful, which can be queried directly.

    # extract image dimensions
    width = int(root.find('.//size/width').text)
    height = int(root.find('.//size/height').text)

    We can tie all of this together into a function that will take the annotation filename as an argument, extract the bounding box and image dimension details, and return them for use.

    The extract_boxes() function below implements this behavior.

    # function to extract bounding boxes from an annotation file
    def extract_boxes(filename):
    	# load and parse the file
    	tree = ElementTree.parse(filename)
    	# get the root of the document
    	root = tree.getroot()
    	# extract each bounding box
    	boxes = list()
    	for box in root.findall('.//bndbox'):
    		xmin = int(box.find('xmin').text)
    		ymin = int(box.find('ymin').text)
    		xmax = int(box.find('xmax').text)
    		ymax = int(box.find('ymax').text)
    		coors = [xmin, ymin, xmax, ymax]
    		boxes.append(coors)
    	# extract image dimensions
    	width = int(root.find('.//size/width').text)
    	height = int(root.find('.//size/height').text)
    	return boxes, width, height

    We can test out this function on our annotation files, for example, on the first annotation file in the directory.

    The complete example is listed below.

    # example of extracting bounding boxes from an annotation file
    from xml.etree import ElementTree
    
    # function to extract bounding boxes from an annotation file
    def extract_boxes(filename):
    	# load and parse the file
    	tree = ElementTree.parse(filename)
    	# get the root of the document
    	root = tree.getroot()
    	# extract each bounding box
    	boxes = list()
    	for box in root.findall('.//bndbox'):
    		xmin = int(box.find('xmin').text)
    		ymin = int(box.find('ymin').text)
    		xmax = int(box.find('xmax').text)
    		ymax = int(box.find('ymax').text)
    		coors = [xmin, ymin, xmax, ymax]
    		boxes.append(coors)
    	# extract image dimensions
    	width = int(root.find('.//size/width').text)
    	height = int(root.find('.//size/height').text)
    	return boxes, width, height
    
    # extract details form annotation file
    boxes, w, h = extract_boxes('kangaroo/annots/00001.xml')
    # summarize extracted details
    print(boxes, w, h)

    Running the example returns a list that contains the details of each bounding box in the annotation file, as well as two integers for the width and height of the photograph.

    [[233, 89, 386, 262], [134, 105, 341, 253]] 450 319

    Now that we know how to load the annotation file, we can look at using this functionality to develop a Dataset object.

    Develop KangarooDataset Object

    The mask-rcnn library requires that train, validation, and test datasets be managed by a mrcnn.utils.Dataset object.

    This means that a new class must be defined that extends the mrcnn.utils.Dataset class and defines a function to load the dataset, with any name you like such as load_dataset(), and override two functions, one for loading a mask called load_mask() and one for loading an image reference (path or URL) called image_reference().

    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# ...
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# ...
    
    	# load an image reference
    	def image_reference(self, image_id):
    		# ...

    To use a Dataset object, it is instantiated, then your custom load function must be called, then finally the built-in prepare() function is called.

    For example, we will create a new class called KangarooDataset that will be used as follows:

    # prepare the dataset
    train_set = KangarooDataset()
    train_set.load_dataset(...)
    train_set.prepare()

    The custom load function, e.g. load_dataset() is responsible for both defining the classes and for defining the images in the dataset.

    Classes are defined by calling the built-in add_class() function and specifying the ‘source‘ (the name of the dataset), the ‘class_id‘ or integer for the class (e.g. 1 for the first lass as 0 is reserved for the background class), and the ‘class_name‘ (e.g. ‘kangaroo‘).

    # define one class
    self.add_class("dataset", 1, "kangaroo")

    Objects are defined by a call to the built-in add_image() function and specifying the ‘source‘ (the name of the dataset), a unique ‘image_id‘ (e.g. the filename without the file extension like ‘00001‘), and the path for where the image can be loaded (e.g. ‘kangaroo/images/00001.jpg‘).

    This will define an “image info” dictionary for the image that can be retrieved later via the index or order in which the image was added to the dataset. You can also specify other arguments that will be added to the image info dictionary, such as an ‘annotation‘ to define the annotation path.

    # add to dataset
    self.add_image('dataset', image_id='00001', path='kangaroo/images/00001.jpg', annotation='kangaroo/annots/00001.xml')

    For example, we can implement a load_dataset() function that takes the path to the dataset directory and loads all images in the dataset.

    Note, testing revealed that there is an issue with image number ‘00090‘, so we will exclude it from the dataset.

    # load the dataset definitions
    def load_dataset(self, dataset_dir):
    	# define one class
    	self.add_class("dataset", 1, "kangaroo")
    	# define data locations
    	images_dir = dataset_dir + '/images/'
    	annotations_dir = dataset_dir + '/annots/'
    	# find all images
    	for filename in listdir(images_dir):
    		# extract image id
    		image_id = filename[:-4]
    		# skip bad images
    		if image_id in ['00090']:
    			continue
    		img_path = images_dir + filename
    		ann_path = annotations_dir + image_id + '.xml'
    		# add to dataset
    		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

    We can go one step further and add one more argument to the function to define whether the Dataset instance is for training or test/validation. We have about 160 photos, so we can use about 20%, or the last 32 photos, as a test or validation dataset and the first 131, or 80%, as the training dataset.

    This division can be made using the integer in the filename, where all photos before photo number 150 will be train and equal or after 150 used for test. The updated load_dataset() with support for train and test datasets is provided below.

    # load the dataset definitions
    def load_dataset(self, dataset_dir, is_train=True):
    	# define one class
    	self.add_class("dataset", 1, "kangaroo")
    	# define data locations
    	images_dir = dataset_dir + '/images/'
    	annotations_dir = dataset_dir + '/annots/'
    	# find all images
    	for filename in listdir(images_dir):
    		# extract image id
    		image_id = filename[:-4]
    		# skip bad images
    		if image_id in ['00090']:
    			continue
    		# skip all images after 150 if we are building the train set
    		if is_train and int(image_id) >= 150:
    			continue
    		# skip all images before 150 if we are building the test/val set
    		if not is_train and int(image_id) < 150:
    			continue
    		img_path = images_dir + filename
    		ann_path = annotations_dir + image_id + '.xml'
    		# add to dataset
    		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

    Next, we need to define the load_mask() function for loading the mask for a given ‘image_id‘.

    In this case, the ‘image_id‘ is the integer index for an image in the dataset, assigned based on the order that the image was added via a call to add_image() when loading the dataset. The function must return an array of one or more masks for the photo associated with the image_id, and the classes for each mask.

    We don’t have masks, but we do have bounding boxes. We can load the bounding boxes for a given photo and return them as masks. The library will then infer bounding boxes from our “masks” which will be the same size.

    First, we must load the annotation file for the image_id. This involves first retrieving the ‘image info‘ dict for the image_id, then retrieving the annotations path that we stored for the image via our prior call to add_image(). We can then use the path in our call to extract_boxes() developed in the previous section to get the list of bounding boxes and the dimensions of the image.

    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)

    We can now define a mask for each bounding box, and an associated class.

    A mask is a two-dimensional array with the same dimensions as the photograph with all zero values where the object isn’t and all one values where the object is in the photograph.

    We can achieve this by creating a NumPy array with all zero values for the known size of the image and one channel for each bounding box.

    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')

    Each bounding box is defined as min and max, x and y coordinates of the box.

    These can be used directly to define row and column ranges in the array that can then be marked as 1.

    # create masks
    for i in range(len(boxes)):
    	box = boxes[i]
    	row_s, row_e = box[1], box[3]
    	col_s, col_e = box[0], box[2]
    	masks[row_s:row_e, col_s:col_e, i] = 1

    All objects have the same class in this dataset. We can retrieve the class index via the ‘class_names‘ dictionary, then add it to a list to be returned alongside the masks.

    self.class_names.index('kangaroo')

    Tying this together, the complete load_mask() function is listed below.

    # load the masks for an image
    def load_mask(self, image_id):
    	# get details of image
    	info = self.image_info[image_id]
    	# define box file location
    	path = info['annotation']
    	# load XML
    	boxes, w, h = self.extract_boxes(path)
    	# create one array for all masks, each on a different channel
    	masks = zeros([h, w, len(boxes)], dtype='uint8')
    	# create masks
    	class_ids = list()
    	for i in range(len(boxes)):
    		box = boxes[i]
    		row_s, row_e = box[1], box[3]
    		col_s, col_e = box[0], box[2]
    		masks[row_s:row_e, col_s:col_e, i] = 1
    		class_ids.append(self.class_names.index('kangaroo'))
    	return masks, asarray(class_ids, dtype='int32')

    Finally, we must implement the image_reference() function.

    This function is responsible for returning the path or URL for a given ‘image_id‘, which we know is just the ‘path‘ property on the ‘image info‘ dict.

    # load an image reference
    def image_reference(self, image_id):
    	info = self.image_info[image_id]
    	return info['path']

    And that’s it. We have successfully defined a Dataset object for the mask-rcnn library for our Kangaroo dataset.

    The complete listing of the class and creating a train and test dataset is provided below.

    # split into train and test set
    from os import listdir
    from xml.etree import ElementTree
    from numpy import zeros
    from numpy import asarray
    from mrcnn.utils import Dataset
    
    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# define one class
    		self.add_class("dataset", 1, "kangaroo")
    		# define data locations
    		images_dir = dataset_dir + '/images/'
    		annotations_dir = dataset_dir + '/annots/'
    		# find all images
    		for filename in listdir(images_dir):
    			# extract image id
    			image_id = filename[:-4]
    			# skip bad images
    			if image_id in ['00090']:
    				continue
    			# skip all images after 150 if we are building the train set
    			if is_train and int(image_id) >= 150:
    				continue
    			# skip all images before 150 if we are building the test/val set
    			if not is_train and int(image_id) < 150:
    				continue
    			img_path = images_dir + filename
    			ann_path = annotations_dir + image_id + '.xml'
    			# add to dataset
    			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
    
    	# extract bounding boxes from an annotation file
    	def extract_boxes(self, filename):
    		# load and parse the file
    		tree = ElementTree.parse(filename)
    		# get the root of the document
    		root = tree.getroot()
    		# extract each bounding box
    		boxes = list()
    		for box in root.findall('.//bndbox'):
    			xmin = int(box.find('xmin').text)
    			ymin = int(box.find('ymin').text)
    			xmax = int(box.find('xmax').text)
    			ymax = int(box.find('ymax').text)
    			coors = [xmin, ymin, xmax, ymax]
    			boxes.append(coors)
    		# extract image dimensions
    		width = int(root.find('.//size/width').text)
    		height = int(root.find('.//size/height').text)
    		return boxes, width, height
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# get details of image
    		info = self.image_info[image_id]
    		# define box file location
    		path = info['annotation']
    		# load XML
    		boxes, w, h = self.extract_boxes(path)
    		# create one array for all masks, each on a different channel
    		masks = zeros([h, w, len(boxes)], dtype='uint8')
    		# create masks
    		class_ids = list()
    		for i in range(len(boxes)):
    			box = boxes[i]
    			row_s, row_e = box[1], box[3]
    			col_s, col_e = box[0], box[2]
    			masks[row_s:row_e, col_s:col_e, i] = 1
    			class_ids.append(self.class_names.index('kangaroo'))
    		return masks, asarray(class_ids, dtype='int32')
    
    	# load an image reference
    	def image_reference(self, image_id):
    		info = self.image_info[image_id]
    		return info['path']
    
    # train set
    train_set = KangarooDataset()
    train_set.load_dataset('kangaroo', is_train=True)
    train_set.prepare()
    print('Train: %d' % len(train_set.image_ids))
    
    # test/val set
    test_set = KangarooDataset()
    test_set.load_dataset('kangaroo', is_train=False)
    test_set.prepare()
    print('Test: %d' % len(test_set.image_ids))

    Running the example successfully loads and prepares the train and test dataset and prints the number of images in each.

    Train: 131
    Test: 32

    Now that we have defined the dataset, let’s confirm that the images, masks, and bounding boxes are handled correctly.

    Test KangarooDataset Object

    The first useful test is to confirm that the images and masks can be loaded correctly.

    We can test this by creating a dataset and loading an image via a call to the load_image() function with an image_id, then load the mask for the image via a call to the load_mask() function with the same image_id.

    # load an image
    image_id = 0
    image = train_set.load_image(image_id)
    print(image.shape)
    # load image mask
    mask, class_ids = train_set.load_mask(image_id)
    print(mask.shape)

    Next, we can plot the photograph using the Matplotlib API, then plot the first mask over the top with an alpha value so that the photograph underneath can still be seen

    # plot image
    pyplot.imshow(image)
    # plot mask
    pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
    pyplot.show()

    The complete example is listed below.

    # plot one photograph and mask
    from os import listdir
    from xml.etree import ElementTree
    from numpy import zeros
    from numpy import asarray
    from mrcnn.utils import Dataset
    from matplotlib import pyplot
    
    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# define one class
    		self.add_class("dataset", 1, "kangaroo")
    		# define data locations
    		images_dir = dataset_dir + '/images/'
    		annotations_dir = dataset_dir + '/annots/'
    		# find all images
    		for filename in listdir(images_dir):
    			# extract image id
    			image_id = filename[:-4]
    			# skip bad images
    			if image_id in ['00090']:
    				continue
    			# skip all images after 150 if we are building the train set
    			if is_train and int(image_id) >= 150:
    				continue
    			# skip all images before 150 if we are building the test/val set
    			if not is_train and int(image_id) < 150:
    				continue
    			img_path = images_dir + filename
    			ann_path = annotations_dir + image_id + '.xml'
    			# add to dataset
    			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
    
    	# extract bounding boxes from an annotation file
    	def extract_boxes(self, filename):
    		# load and parse the file
    		tree = ElementTree.parse(filename)
    		# get the root of the document
    		root = tree.getroot()
    		# extract each bounding box
    		boxes = list()
    		for box in root.findall('.//bndbox'):
    			xmin = int(box.find('xmin').text)
    			ymin = int(box.find('ymin').text)
    			xmax = int(box.find('xmax').text)
    			ymax = int(box.find('ymax').text)
    			coors = [xmin, ymin, xmax, ymax]
    			boxes.append(coors)
    		# extract image dimensions
    		width = int(root.find('.//size/width').text)
    		height = int(root.find('.//size/height').text)
    		return boxes, width, height
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# get details of image
    		info = self.image_info[image_id]
    		# define box file location
    		path = info['annotation']
    		# load XML
    		boxes, w, h = self.extract_boxes(path)
    		# create one array for all masks, each on a different channel
    		masks = zeros([h, w, len(boxes)], dtype='uint8')
    		# create masks
    		class_ids = list()
    		for i in range(len(boxes)):
    			box = boxes[i]
    			row_s, row_e = box[1], box[3]
    			col_s, col_e = box[0], box[2]
    			masks[row_s:row_e, col_s:col_e, i] = 1
    			class_ids.append(self.class_names.index('kangaroo'))
    		return masks, asarray(class_ids, dtype='int32')
    
    	# load an image reference
    	def image_reference(self, image_id):
    		info = self.image_info[image_id]
    		return info['path']
    
    # train set
    train_set = KangarooDataset()
    train_set.load_dataset('kangaroo', is_train=True)
    train_set.prepare()
    # load an image
    image_id = 0
    image = train_set.load_image(image_id)
    print(image.shape)
    # load image mask
    mask, class_ids = train_set.load_mask(image_id)
    print(mask.shape)
    # plot image
    pyplot.imshow(image)
    # plot mask
    pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
    pyplot.show()

    Running the example first prints the shape of the photograph and mask NumPy arrays.

    We can confirm that both arrays have the same width and height and only differ in terms of the number of channels. We can also see that the first photograph (e.g. image_id=0) in this case only has one mask.

    (626, 899, 3)
    (626, 899, 1)

    A plot of the photograph is also created with the first mask overlaid.

    In this case, we can see that one kangaroo is present in the photo and that the mask correctly bounds the kangaroo.

    Photograph of Kangaroo With Object Detection Mask Overlaid

    Photograph of Kangaroo With Object Detection Mask Overlaid

    We could repeat this for the first nine photos in the dataset, plotting each photo in one figure as a subplot and plotting all masks for each photo.

    # plot first few images
    for i in range(9):
    	# define subplot
    	pyplot.subplot(330 + 1 + i)
    	# plot raw pixel data
    	image = train_set.load_image(i)
    	pyplot.imshow(image)
    	# plot all masks
    	mask, _ = train_set.load_mask(i)
    	for j in range(mask.shape[2]):
    		pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
    # show the figure
    pyplot.show()

    Running the example shows that photos are loaded correctly and that those photos with multiple objects correctly have separate masks defined.

    Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks

    Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks

    Another useful debugging step might be to load all of the ‘image info‘ objects in the dataset and print them to the console.

    This can help to confirm that all of the calls to the add_image() function in the load_dataset() function worked as expected.

    # enumerate all images in the dataset
    for image_id in train_set.image_ids:
    	# load image info
    	info = train_set.image_info[image_id]
    	# display on the console
    	print(info)

    Running this code on the loaded training dataset will then show all of the ‘image info‘ dictionaries, showing the paths and ids for each image in the dataset.

    {'id': '00132', 'source': 'dataset', 'path': 'kangaroo/images/00132.jpg', 'annotation': 'kangaroo/annots/00132.xml'}
    {'id': '00046', 'source': 'dataset', 'path': 'kangaroo/images/00046.jpg', 'annotation': 'kangaroo/annots/00046.xml'}
    {'id': '00052', 'source': 'dataset', 'path': 'kangaroo/images/00052.jpg', 'annotation': 'kangaroo/annots/00052.xml'}
    ...

    Finally, the mask-rcnn library provides utilities for displaying images and masks. We can use some of these built-in functions to confirm that the Dataset is operating correctly.

    For example, the mask-rcnn library provides the mrcnn.visualize.display_instances() function that will show a photograph with bounding boxes, masks, and class labels. This requires that the bounding boxes are extracted from the masks via the extract_bboxes() function.

    # define image id
    image_id = 1
    # load the image
    image = train_set.load_image(image_id)
    # load the masks and the class ids
    mask, class_ids = train_set.load_mask(image_id)
    # extract bounding boxes from the masks
    bbox = extract_bboxes(mask)
    # display image with masks and bounding boxes
    display_instances(image, bbox, mask, class_ids, train_set.class_names)

    For completeness, the full code listing is provided below.

    # display image with masks and bounding boxes
    from os import listdir
    from xml.etree import ElementTree
    from numpy import zeros
    from numpy import asarray
    from mrcnn.utils import Dataset
    from mrcnn.visualize import display_instances
    from mrcnn.utils import extract_bboxes
    
    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# define one class
    		self.add_class("dataset", 1, "kangaroo")
    		# define data locations
    		images_dir = dataset_dir + '/images/'
    		annotations_dir = dataset_dir + '/annots/'
    		# find all images
    		for filename in listdir(images_dir):
    			# extract image id
    			image_id = filename[:-4]
    			# skip bad images
    			if image_id in ['00090']:
    				continue
    			# skip all images after 150 if we are building the train set
    			if is_train and int(image_id) >= 150:
    				continue
    			# skip all images before 150 if we are building the test/val set
    			if not is_train and int(image_id) < 150:
    				continue
    			img_path = images_dir + filename
    			ann_path = annotations_dir + image_id + '.xml'
    			# add to dataset
    			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
    
    	# extract bounding boxes from an annotation file
    	def extract_boxes(self, filename):
    		# load and parse the file
    		tree = ElementTree.parse(filename)
    		# get the root of the document
    		root = tree.getroot()
    		# extract each bounding box
    		boxes = list()
    		for box in root.findall('.//bndbox'):
    			xmin = int(box.find('xmin').text)
    			ymin = int(box.find('ymin').text)
    			xmax = int(box.find('xmax').text)
    			ymax = int(box.find('ymax').text)
    			coors = [xmin, ymin, xmax, ymax]
    			boxes.append(coors)
    		# extract image dimensions
    		width = int(root.find('.//size/width').text)
    		height = int(root.find('.//size/height').text)
    		return boxes, width, height
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# get details of image
    		info = self.image_info[image_id]
    		# define box file location
    		path = info['annotation']
    		# load XML
    		boxes, w, h = self.extract_boxes(path)
    		# create one array for all masks, each on a different channel
    		masks = zeros([h, w, len(boxes)], dtype='uint8')
    		# create masks
    		class_ids = list()
    		for i in range(len(boxes)):
    			box = boxes[i]
    			row_s, row_e = box[1], box[3]
    			col_s, col_e = box[0], box[2]
    			masks[row_s:row_e, col_s:col_e, i] = 1
    			class_ids.append(self.class_names.index('kangaroo'))
    		return masks, asarray(class_ids, dtype='int32')
    
    	# load an image reference
    	def image_reference(self, image_id):
    		info = self.image_info[image_id]
    		return info['path']
    
    # train set
    train_set = KangarooDataset()
    train_set.load_dataset('kangaroo', is_train=True)
    train_set.prepare()
    # define image id
    image_id = 1
    # load the image
    image = train_set.load_image(image_id)
    # load the masks and the class ids
    mask, class_ids = train_set.load_mask(image_id)
    # extract bounding boxes from the masks
    bbox = extract_bboxes(mask)
    # display image with masks and bounding boxes
    display_instances(image, bbox, mask, class_ids, train_set.class_names)

    Running the example creates a plot showing the photograph with the mask for each object in a separate color.

    The bounding boxes match the masks exactly, by design, and are shown with dotted outlines. Finally, each object is marked with the class label, which in this case is ‘kangaroo‘.

    Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels

    Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels

    Now that we are confident that our dataset is being loaded correctly, we can use it to fit a Mask R-CNN model.

    How to Train Mask R-CNN Model for Kangaroo Detection

    A Mask R-CNN model can be fit from scratch, although like other computer vision applications, time can be saved and performance can be improved by using transfer learning.

    The Mask R-CNN model pre-fit on the MS COCO object detection dataset can be used as a starting point and then tailored to the specific dataset, in this case, the kangaroo dataset.

    The first step is to download the model file (architecture and weights) for the pre-fit Mask R-CNN model. The weights are available from the GitHub project and the file is about 250 megabytes.

    Download the model weights to a file with the name ‘mask_rcnn_coco.h5‘ in your current working directory.

    • Download Weights (mask_rcnn_coco.h5) 246M

    Next, a configuration object for the model must be defined.

    This is a new class that extends the mrcnn.config.Config class and defines properties of both the prediction problem (such as name and the number of classes) and the algorithm for training the model (such as the learning rate).

    The configuration must define the name of the configuration via the ‘NAME‘ attribute, e.g. ‘kangaroo_cfg‘, that will be used to save details and models to file during the run. The configuration must also define the number of classes in the prediction problem via the ‘NUM_CLASSES‘ attribute. In this case, we only have one object type of kangaroo, although there is always an additional class for the background.

    Finally, we must define the number of samples (photos) used in each training epoch. This will be the number of photos in the training dataset, in this case, 131.

    Tying this together, our custom KangarooConfig class is defined below.

    # define a configuration for the model
    class KangarooConfig(Config):
    	# Give the configuration a recognizable name
    	NAME = "kangaroo_cfg"
    	# Number of classes (background + kangaroo)
    	NUM_CLASSES = 1 + 1
    	# Number of training steps per epoch
    	STEPS_PER_EPOCH = 131
    
    # prepare config
    config = KangarooConfig()

    Next, we can define our model.

    This is achieved by creating an instance of the mrcnn.model.MaskRCNN class and specifying the model will be used for training via setting the ‘mode‘ argument to ‘training‘.

    The ‘config‘ argument must also be specified with an instance of our KangarooConfig class.

    Finally, a directory is needed where configuration files can be saved and where checkpoint models can be saved at the end of each epoch. We will use the current working directory.

    # define the model
    model = MaskRCNN(mode='training', model_dir='./', config=config)

    Next, the pre-defined model architecture and weights can be loaded. This can be achieved by calling the load_weights() function on the model and specifying the path to the downloaded ‘mask_rcnn_coco.h5‘ file.

    The model will be used as-is, although the class-specific output layers will be removed so that new output layers can be defined and trained. This can be done by specifying the ‘exclude‘ argument and listing all of the output layers to exclude or remove from the model after it is loaded. This includes the output layers for the classification label, bounding boxes, and masks.

    # load weights (mscoco)
    model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])

    Next, the model can be fit on the training dataset by calling the train() function and passing in both the training dataset and the validation dataset. We can also specify the learning rate as the default learning rate in the configuration (0.001).

    We can also specify what layers to train. In this case, we will only train the heads, that is the output layers of the model.

    # train weights (output layers or 'heads')
    model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

    We could follow this training with further epochs that fine-tune all of the weights in the model. This could be achieved by using a smaller learning rate and changing the ‘layer’ argument from ‘heads’ to ‘all’.

    The complete example of training a Mask R-CNN on the kangaroo dataset is listed below.

    This may take some time to execute on the CPU, even with modern hardware. I recommend running the code with a GPU, such as on Amazon EC2, where it will finish in about five minutes on a P3 type hardware.

    # fit a mask rcnn on the kangaroo dataset
    from os import listdir
    from xml.etree import ElementTree
    from numpy import zeros
    from numpy import asarray
    from mrcnn.utils import Dataset
    from mrcnn.config import Config
    from mrcnn.model import MaskRCNN
    
    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# define one class
    		self.add_class("dataset", 1, "kangaroo")
    		# define data locations
    		images_dir = dataset_dir + '/images/'
    		annotations_dir = dataset_dir + '/annots/'
    		# find all images
    		for filename in listdir(images_dir):
    			# extract image id
    			image_id = filename[:-4]
    			# skip bad images
    			if image_id in ['00090']:
    				continue
    			# skip all images after 150 if we are building the train set
    			if is_train and int(image_id) >= 150:
    				continue
    			# skip all images before 150 if we are building the test/val set
    			if not is_train and int(image_id) < 150:
    				continue
    			img_path = images_dir + filename
    			ann_path = annotations_dir + image_id + '.xml'
    			# add to dataset
    			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
    
    	# extract bounding boxes from an annotation file
    	def extract_boxes(self, filename):
    		# load and parse the file
    		tree = ElementTree.parse(filename)
    		# get the root of the document
    		root = tree.getroot()
    		# extract each bounding box
    		boxes = list()
    		for box in root.findall('.//bndbox'):
    			xmin = int(box.find('xmin').text)
    			ymin = int(box.find('ymin').text)
    			xmax = int(box.find('xmax').text)
    			ymax = int(box.find('ymax').text)
    			coors = [xmin, ymin, xmax, ymax]
    			boxes.append(coors)
    		# extract image dimensions
    		width = int(root.find('.//size/width').text)
    		height = int(root.find('.//size/height').text)
    		return boxes, width, height
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# get details of image
    		info = self.image_info[image_id]
    		# define box file location
    		path = info['annotation']
    		# load XML
    		boxes, w, h = self.extract_boxes(path)
    		# create one array for all masks, each on a different channel
    		masks = zeros([h, w, len(boxes)], dtype='uint8')
    		# create masks
    		class_ids = list()
    		for i in range(len(boxes)):
    			box = boxes[i]
    			row_s, row_e = box[1], box[3]
    			col_s, col_e = box[0], box[2]
    			masks[row_s:row_e, col_s:col_e, i] = 1
    			class_ids.append(self.class_names.index('kangaroo'))
    		return masks, asarray(class_ids, dtype='int32')
    
    	# load an image reference
    	def image_reference(self, image_id):
    		info = self.image_info[image_id]
    		return info['path']
    
    # define a configuration for the model
    class KangarooConfig(Config):
    	# define the name of the configuration
    	NAME = "kangaroo_cfg"
    	# number of classes (background + kangaroo)
    	NUM_CLASSES = 1 + 1
    	# number of training steps per epoch
    	STEPS_PER_EPOCH = 131
    
    # prepare train set
    train_set = KangarooDataset()
    train_set.load_dataset('kangaroo', is_train=True)
    train_set.prepare()
    print('Train: %d' % len(train_set.image_ids))
    # prepare test/val set
    test_set = KangarooDataset()
    test_set.load_dataset('kangaroo', is_train=False)
    test_set.prepare()
    print('Test: %d' % len(test_set.image_ids))
    # prepare config
    config = KangarooConfig()
    config.display()
    # define the model
    model = MaskRCNN(mode='training', model_dir='./', config=config)
    # load weights (mscoco) and exclude the output layers
    model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
    # train weights (output layers or 'heads')
    model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

    Running the example will report progress using the standard Keras progress bars.

    We can see that there are many different train and test loss scores reported for each of the output heads of the network. It can be quite confusing as to which loss to pay attention to.

    In this example where we are interested in object detection instead of object segmentation, I recommend paying attention to the loss for the classification output on the train and validation datasets (e.g. mrcnn_class_loss and val_mrcnn_class_loss), as well as the loss for the bounding box output for the train and validation datasets (mrcnn_bbox_loss and val_mrcnn_bbox_loss).

    Epoch 1/5
    131/131 [==============================] - 106s 811ms/step - loss: 0.8491 - rpn_class_loss: 0.0044 - rpn_bbox_loss: 0.1452 - mrcnn_class_loss: 0.0420 - mrcnn_bbox_loss: 0.2874 - mrcnn_mask_loss: 0.3701 - val_loss: 1.3402 - val_rpn_class_loss: 0.0160 - val_rpn_bbox_loss: 0.7913 - val_mrcnn_class_loss: 0.0092 - val_mrcnn_bbox_loss: 0.2263 - val_mrcnn_mask_loss: 0.2975
    Epoch 2/5
    131/131 [==============================] - 69s 526ms/step - loss: 0.4774 - rpn_class_loss: 0.0025 - rpn_bbox_loss: 0.1159 - mrcnn_class_loss: 0.0170 - mrcnn_bbox_loss: 0.1134 - mrcnn_mask_loss: 0.2285 - val_loss: 0.6261 - val_rpn_class_loss: 8.9502e-04 - val_rpn_bbox_loss: 0.1624 - val_mrcnn_class_loss: 0.0197 - val_mrcnn_bbox_loss: 0.2148 - val_mrcnn_mask_loss: 0.2282
    Epoch 3/5
    131/131 [==============================] - 67s 515ms/step - loss: 0.4471 - rpn_class_loss: 0.0029 - rpn_bbox_loss: 0.1153 - mrcnn_class_loss: 0.0234 - mrcnn_bbox_loss: 0.0958 - mrcnn_mask_loss: 0.2097 - val_loss: 1.2998 - val_rpn_class_loss: 0.0144 - val_rpn_bbox_loss: 0.6712 - val_mrcnn_class_loss: 0.0372 - val_mrcnn_bbox_loss: 0.2645 - val_mrcnn_mask_loss: 0.3125
    Epoch 4/5
    131/131 [==============================] - 66s 502ms/step - loss: 0.3934 - rpn_class_loss: 0.0026 - rpn_bbox_loss: 0.1003 - mrcnn_class_loss: 0.0171 - mrcnn_bbox_loss: 0.0806 - mrcnn_mask_loss: 0.1928 - val_loss: 0.6709 - val_rpn_class_loss: 0.0016 - val_rpn_bbox_loss: 0.2012 - val_mrcnn_class_loss: 0.0244 - val_mrcnn_bbox_loss: 0.1942 - val_mrcnn_mask_loss: 0.2495
    Epoch 5/5
    131/131 [==============================] - 65s 493ms/step - loss: 0.3357 - rpn_class_loss: 0.0024 - rpn_bbox_loss: 0.0804 - mrcnn_class_loss: 0.0193 - mrcnn_bbox_loss: 0.0616 - mrcnn_mask_loss: 0.1721 - val_loss: 0.8878 - val_rpn_class_loss: 0.0030 - val_rpn_bbox_loss: 0.4409 - val_mrcnn_class_loss: 0.0174 - val_mrcnn_bbox_loss: 0.1752 - val_mrcnn_mask_loss: 0.2513

    A model file is created and saved at the end of each epoch in a subdirectory that starts with ‘kangaroo_cfg‘ followed by random characters.

    A model must be selected for use; in this case, the loss continues to decrease for the bounding boxes on each epoch, so we will use the final model at the end of the run (‘mask_rcnn_kangaroo_cfg_0005.h5‘).

    Copy the model file from the config directory into your current working directory. We will use it in the following sections to evaluate the model and make predictions.

    The results suggest that perhaps more training epochs could be useful, perhaps fine-tuning all of the layers in the model; this might make an interesting extension to the tutorial.

    Next, let’s look at evaluating the performance of this model.

    How to Evaluate a Mask R-CNN Model

    The performance of a model for an object recognition task is often evaluated using the mean absolute precision, or mAP.

    We are predicting bounding boxes so we can determine whether a bounding box prediction is good or not based on how well the predicted and actual bounding boxes overlap. This can be calculated by dividing the area of the overlap by the total area of both bounding boxes, or the intersection divided by the union, referred to as “intersection over union,” or IoU. A perfect bounding box prediction will have an IoU of 1.

    It is standard to assume a positive prediction of a bounding box if the IoU is greater than 0.5, e.g. they overlap by 50% or more.

    Precision refers to the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all bounding boxes predicted. Recall is the percentage of the correctly predicted bounding boxes (IoU > 0.5) out of all objects in the photo.

    As we make more predictions, the recall percentage will increase, but precision will drop or become erratic as we start making false positive predictions. The recall (x) can be plotted against the precision (y) for each number of predictions to create a curve or line. We can maximize the value of each point on this line and calculate the average value of the precision or AP for each value of recall.

    Note: there are variations on how AP is calculated, e.g. the way it is calculated for the widely used PASCAL VOC dataset and the MS COCO dataset differ.

    The average or mean of the average precision (AP) across all of the images in a dataset is called the mean average precision, or mAP.

    The mask-rcnn library provides a mrcnn.utils.compute_ap to calculate the AP and other metrics for a given images. These AP scores can be collected across a dataset and the mean calculated to give an idea at how good the model is at detecting objects in a dataset.

    First, we must define a new Config object to use for making predictions, instead of training. We can extend our previously defined KangarooConfig to reuse the parameters. Instead, we will define a new object with the same values to keep the code compact. The config must change some of the defaults around using the GPU for inference that are different from how they are set for training a model (regardless of whether you are running on the GPU or CPU).

    # define the prediction configuration
    class PredictionConfig(Config):
    	# define the name of the configuration
    	NAME = "kangaroo_cfg"
    	# number of classes (background + kangaroo)
    	NUM_CLASSES = 1 + 1
    	# simplify GPU config
    	GPU_COUNT = 1
    	IMAGES_PER_GPU = 1

    Next, we can define the model with the config and set the ‘mode‘ argument to ‘inference‘ instead of ‘training‘.

    # create config
    cfg = PredictionConfig()
    # define the model
    model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

    Next, we can load the weights from our saved model.

    We can do that by specifying the path to the model file. In this case, the model file is ‘mask_rcnn_kangaroo_cfg_0005.h5‘ in the current working directory.

    # load model weights
    model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)

    Next, we can evaluate the model. This involves enumerating the images in a dataset, making a prediction, and calculating the AP for the prediction before predicting a mean AP across all images.

    First, the image and ground truth mask can be loaded from the dataset for a given image_id. This can be achieved using the load_image_gt() convenience function.

    # load image, bounding boxes and masks for the image id
    image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)

    Next, the pixel values of the loaded image must be scaled in the same way as was performed on the training data, e.g. centered. This can be achieved using the mold_image() convenience function.

    # convert pixel values (e.g. center)
    scaled_image = mold_image(image, cfg)

    The dimensions of the image then need to be expanded one sample in a dataset and used as input to make a prediction with the model.

    sample = expand_dims(scaled_image, 0)
    # make prediction
    yhat = model.detect(sample, verbose=0)
    # extract results for first sample
    r = yhat[0]

    Next, the prediction can be compared to the ground truth and metrics calculated using the compute_ap() function.

    # calculate statistics, including AP
    AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])

    The AP values can be added to a list, then the mean value calculated.

    Tying this together, the evaluate_model() function below implements this and calculates the mAP given a dataset, model and configuration.

    # calculate the mAP for a model on a given dataset
    def evaluate_model(dataset, model, cfg):
    	APs = list()
    	for image_id in dataset.image_ids:
    		# load image, bounding boxes and masks for the image id
    		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
    		# convert pixel values (e.g. center)
    		scaled_image = mold_image(image, cfg)
    		# convert image into one sample
    		sample = expand_dims(scaled_image, 0)
    		# make prediction
    		yhat = model.detect(sample, verbose=0)
    		# extract results for first sample
    		r = yhat[0]
    		# calculate statistics, including AP
    		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
    		# store
    		APs.append(AP)
    	# calculate the mean AP across all images
    	mAP = mean(APs)
    	return mAP

    We can now calculate the mAP for the model on the train and test datasets.

    # evaluate model on training dataset
    train_mAP = evaluate_model(train_set, model, cfg)
    print("Train mAP: %.3f" % train_mAP)
    # evaluate model on test dataset
    test_mAP = evaluate_model(test_set, model, cfg)
    print("Test mAP: %.3f" % test_mAP)

    The full code listing is provided below for completeness.

    # evaluate the mask rcnn model on the kangaroo dataset
    from os import listdir
    from xml.etree import ElementTree
    from numpy import zeros
    from numpy import asarray
    from numpy import expand_dims
    from numpy import mean
    from mrcnn.config import Config
    from mrcnn.model import MaskRCNN
    from mrcnn.utils import Dataset
    from mrcnn.utils import compute_ap
    from mrcnn.model import load_image_gt
    from mrcnn.model import mold_image
    
    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# define one class
    		self.add_class("dataset", 1, "kangaroo")
    		# define data locations
    		images_dir = dataset_dir + '/images/'
    		annotations_dir = dataset_dir + '/annots/'
    		# find all images
    		for filename in listdir(images_dir):
    			# extract image id
    			image_id = filename[:-4]
    			# skip bad images
    			if image_id in ['00090']:
    				continue
    			# skip all images after 150 if we are building the train set
    			if is_train and int(image_id) >= 150:
    				continue
    			# skip all images before 150 if we are building the test/val set
    			if not is_train and int(image_id) < 150:
    				continue
    			img_path = images_dir + filename
    			ann_path = annotations_dir + image_id + '.xml'
    			# add to dataset
    			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
    
    	# extract bounding boxes from an annotation file
    	def extract_boxes(self, filename):
    		# load and parse the file
    		tree = ElementTree.parse(filename)
    		# get the root of the document
    		root = tree.getroot()
    		# extract each bounding box
    		boxes = list()
    		for box in root.findall('.//bndbox'):
    			xmin = int(box.find('xmin').text)
    			ymin = int(box.find('ymin').text)
    			xmax = int(box.find('xmax').text)
    			ymax = int(box.find('ymax').text)
    			coors = [xmin, ymin, xmax, ymax]
    			boxes.append(coors)
    		# extract image dimensions
    		width = int(root.find('.//size/width').text)
    		height = int(root.find('.//size/height').text)
    		return boxes, width, height
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# get details of image
    		info = self.image_info[image_id]
    		# define box file location
    		path = info['annotation']
    		# load XML
    		boxes, w, h = self.extract_boxes(path)
    		# create one array for all masks, each on a different channel
    		masks = zeros([h, w, len(boxes)], dtype='uint8')
    		# create masks
    		class_ids = list()
    		for i in range(len(boxes)):
    			box = boxes[i]
    			row_s, row_e = box[1], box[3]
    			col_s, col_e = box[0], box[2]
    			masks[row_s:row_e, col_s:col_e, i] = 1
    			class_ids.append(self.class_names.index('kangaroo'))
    		return masks, asarray(class_ids, dtype='int32')
    
    	# load an image reference
    	def image_reference(self, image_id):
    		info = self.image_info[image_id]
    		return info['path']
    
    # define the prediction configuration
    class PredictionConfig(Config):
    	# define the name of the configuration
    	NAME = "kangaroo_cfg"
    	# number of classes (background + kangaroo)
    	NUM_CLASSES = 1 + 1
    	# simplify GPU config
    	GPU_COUNT = 1
    	IMAGES_PER_GPU = 1
    
    # calculate the mAP for a model on a given dataset
    def evaluate_model(dataset, model, cfg):
    	APs = list()
    	for image_id in dataset.image_ids:
    		# load image, bounding boxes and masks for the image id
    		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
    		# convert pixel values (e.g. center)
    		scaled_image = mold_image(image, cfg)
    		# convert image into one sample
    		sample = expand_dims(scaled_image, 0)
    		# make prediction
    		yhat = model.detect(sample, verbose=0)
    		# extract results for first sample
    		r = yhat[0]
    		# calculate statistics, including AP
    		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
    		# store
    		APs.append(AP)
    	# calculate the mean AP across all images
    	mAP = mean(APs)
    	return mAP
    
    # load the train dataset
    train_set = KangarooDataset()
    train_set.load_dataset('kangaroo', is_train=True)
    train_set.prepare()
    print('Train: %d' % len(train_set.image_ids))
    # load the test dataset
    test_set = KangarooDataset()
    test_set.load_dataset('kangaroo', is_train=False)
    test_set.prepare()
    print('Test: %d' % len(test_set.image_ids))
    # create config
    cfg = PredictionConfig()
    # define the model
    model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
    # load model weights
    model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)
    # evaluate model on training dataset
    train_mAP = evaluate_model(train_set, model, cfg)
    print("Train mAP: %.3f" % train_mAP)
    # evaluate model on test dataset
    test_mAP = evaluate_model(test_set, model, cfg)
    print("Test mAP: %.3f" % test_mAP)

    Running the example will make a prediction for each image in the train and test datasets and calculate the mAP for each.

    A mAP above 90% or 95% is a good score. We can see that the mAP score is good on both datasets, and perhaps slightly better on the test dataset, instead of the train dataset.

    This may be because the dataset is very small, and/or because the model could benefit from further training.

    Train mAP: 0.929
    Test mAP: 0.958

    Now that we have some confidence that the model is sensible, we can use it to make some predictions.

    How to Detect Kangaroos in New Photos

    We can use the trained model to detect kangaroos in new photographs, specifically, in photos that we expect to have kangaroos.

    First, we need a new photo of a kangaroo.

    We could go to Flickr and find a random photo of a kangaroo. Alternately, we can use any of the photos in the test dataset that were not used to train the model.

    We have already seen in the previous section how to make a prediction with an image. Specifically, scaling the pixel values and calling model.detect(). For example:

    # example of making a prediction
    ...
    # load image
    image = ...
    # convert pixel values (e.g. center)
    scaled_image = mold_image(image, cfg)
    # convert image into one sample
    sample = expand_dims(scaled_image, 0)
    # make prediction
    yhat = model.detect(sample, verbose=0)
    ...

    Let’s take it one step further and make predictions for a number of images in a dataset, then plot the photo with bounding boxes side-by-side with the photo and the predicted bounding boxes. This will provide a visual guide to how good the model is at making predictions.

    The first step is to load the image and mask from the dataset.

    # load the image and mask
    image = dataset.load_image(image_id)
    mask, _ = dataset.load_mask(image_id)

    Next, we can make a prediction for the image.

    # convert pixel values (e.g. center)
    scaled_image = mold_image(image, cfg)
    # convert image into one sample
    sample = expand_dims(scaled_image, 0)
    # make prediction
    yhat = model.detect(sample, verbose=0)[0]

    Next, we can create a subplot for the ground truth and plot the image with the known bounding boxes.

    # define subplot
    pyplot.subplot(n_images, 2, i*2+1)
    # plot raw pixel data
    pyplot.imshow(image)
    pyplot.title('Actual')
    # plot masks
    for j in range(mask.shape[2]):
    	pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

    We can then create a second subplot beside the first and plot the first, plot the photo again, and this time draw the predicted bounding boxes in red.

    # get the context for drawing boxes
    pyplot.subplot(n_images, 2, i*2+2)
    # plot raw pixel data
    pyplot.imshow(image)
    pyplot.title('Predicted')
    ax = pyplot.gca()
    # plot each box
    for box in yhat['rois']:
    	# get coordinates
    	y1, x1, y2, x2 = box
    	# calculate width and height of the box
    	width, height = x2 - x1, y2 - y1
    	# create the shape
    	rect = Rectangle((x1, y1), width, height, fill=False, color='red')
    	# draw the box
    	ax.add_patch(rect)

    We can tie all of this together into a function that takes a dataset, model, and config and creates a plot of the first five photos in the dataset with ground truth and predicted bound boxes.

    # plot a number of photos with ground truth and predictions
    def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
    	# load image and mask
    	for i in range(n_images):
    		# load the image and mask
    		image = dataset.load_image(i)
    		mask, _ = dataset.load_mask(i)
    		# convert pixel values (e.g. center)
    		scaled_image = mold_image(image, cfg)
    		# convert image into one sample
    		sample = expand_dims(scaled_image, 0)
    		# make prediction
    		yhat = model.detect(sample, verbose=0)[0]
    		# define subplot
    		pyplot.subplot(n_images, 2, i*2+1)
    		# plot raw pixel data
    		pyplot.imshow(image)
    		pyplot.title('Actual')
    		# plot masks
    		for j in range(mask.shape[2]):
    			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
    		# get the context for drawing boxes
    		pyplot.subplot(n_images, 2, i*2+2)
    		# plot raw pixel data
    		pyplot.imshow(image)
    		pyplot.title('Predicted')
    		ax = pyplot.gca()
    		# plot each box
    		for box in yhat['rois']:
    			# get coordinates
    			y1, x1, y2, x2 = box
    			# calculate width and height of the box
    			width, height = x2 - x1, y2 - y1
    			# create the shape
    			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
    			# draw the box
    			ax.add_patch(rect)
    	# show the figure
    	pyplot.show()

    The complete example of loading the trained model and making a prediction for the first few images in the train and test datasets is listed below.

    # detect kangaroos in photos with mask rcnn model
    from os import listdir
    from xml.etree import ElementTree
    from numpy import zeros
    from numpy import asarray
    from numpy import expand_dims
    from matplotlib import pyplot
    from matplotlib.patches import Rectangle
    from mrcnn.config import Config
    from mrcnn.model import MaskRCNN
    from mrcnn.model import mold_image
    from mrcnn.utils import Dataset
    
    # class that defines and loads the kangaroo dataset
    class KangarooDataset(Dataset):
    	# load the dataset definitions
    	def load_dataset(self, dataset_dir, is_train=True):
    		# define one class
    		self.add_class("dataset", 1, "kangaroo")
    		# define data locations
    		images_dir = dataset_dir + '/images/'
    		annotations_dir = dataset_dir + '/annots/'
    		# find all images
    		for filename in listdir(images_dir):
    			# extract image id
    			image_id = filename[:-4]
    			# skip bad images
    			if image_id in ['00090']:
    				continue
    			# skip all images after 150 if we are building the train set
    			if is_train and int(image_id) >= 150:
    				continue
    			# skip all images before 150 if we are building the test/val set
    			if not is_train and int(image_id) < 150:
    				continue
    			img_path = images_dir + filename
    			ann_path = annotations_dir + image_id + '.xml'
    			# add to dataset
    			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)
    
    	# load all bounding boxes for an image
    	def extract_boxes(self, filename):
    		# load and parse the file
    		root = ElementTree.parse(filename)
    		boxes = list()
    		# extract each bounding box
    		for box in root.findall('.//bndbox'):
    			xmin = int(box.find('xmin').text)
    			ymin = int(box.find('ymin').text)
    			xmax = int(box.find('xmax').text)
    			ymax = int(box.find('ymax').text)
    			coors = [xmin, ymin, xmax, ymax]
    			boxes.append(coors)
    		# extract image dimensions
    		width = int(root.find('.//size/width').text)
    		height = int(root.find('.//size/height').text)
    		return boxes, width, height
    
    	# load the masks for an image
    	def load_mask(self, image_id):
    		# get details of image
    		info = self.image_info[image_id]
    		# define box file location
    		path = info['annotation']
    		# load XML
    		boxes, w, h = self.extract_boxes(path)
    		# create one array for all masks, each on a different channel
    		masks = zeros([h, w, len(boxes)], dtype='uint8')
    		# create masks
    		class_ids = list()
    		for i in range(len(boxes)):
    			box = boxes[i]
    			row_s, row_e = box[1], box[3]
    			col_s, col_e = box[0], box[2]
    			masks[row_s:row_e, col_s:col_e, i] = 1
    			class_ids.append(self.class_names.index('kangaroo'))
    		return masks, asarray(class_ids, dtype='int32')
    
    	# load an image reference
    	def image_reference(self, image_id):
    		info = self.image_info[image_id]
    		return info['path']
    
    # define the prediction configuration
    class PredictionConfig(Config):
    	# define the name of the configuration
    	NAME = "kangaroo_cfg"
    	# number of classes (background + kangaroo)
    	NUM_CLASSES = 1 + 1
    	# simplify GPU config
    	GPU_COUNT = 1
    	IMAGES_PER_GPU = 1
    
    # plot a number of photos with ground truth and predictions
    def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
    	# load image and mask
    	for i in range(n_images):
    		# load the image and mask
    		image = dataset.load_image(i)
    		mask, _ = dataset.load_mask(i)
    		# convert pixel values (e.g. center)
    		scaled_image = mold_image(image, cfg)
    		# convert image into one sample
    		sample = expand_dims(scaled_image, 0)
    		# make prediction
    		yhat = model.detect(sample, verbose=0)[0]
    		# define subplot
    		pyplot.subplot(n_images, 2, i*2+1)
    		# plot raw pixel data
    		pyplot.imshow(image)
    		pyplot.title('Actual')
    		# plot masks
    		for j in range(mask.shape[2]):
    			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
    		# get the context for drawing boxes
    		pyplot.subplot(n_images, 2, i*2+2)
    		# plot raw pixel data
    		pyplot.imshow(image)
    		pyplot.title('Predicted')
    		ax = pyplot.gca()
    		# plot each box
    		for box in yhat['rois']:
    			# get coordinates
    			y1, x1, y2, x2 = box
    			# calculate width and height of the box
    			width, height = x2 - x1, y2 - y1
    			# create the shape
    			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
    			# draw the box
    			ax.add_patch(rect)
    	# show the figure
    	pyplot.show()
    
    # load the train dataset
    train_set = KangarooDataset()
    train_set.load_dataset('kangaroo', is_train=True)
    train_set.prepare()
    print('Train: %d' % len(train_set.image_ids))
    # load the test dataset
    test_set = KangarooDataset()
    test_set.load_dataset('kangaroo', is_train=False)
    test_set.prepare()
    print('Test: %d' % len(test_set.image_ids))
    # create config
    cfg = PredictionConfig()
    # define the model
    model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
    # load model weights
    model_path = 'mask_rcnn_kangaroo_cfg_0005.h5'
    model.load_weights(model_path, by_name=True)
    # plot predictions for train dataset
    plot_actual_vs_predicted(train_set, model, cfg)
    # plot predictions for test dataset
    plot_actual_vs_predicted(test_set, model, cfg)

    Running the example first creates a figure showing five photos from the training dataset with the ground truth bounding boxes, with the same photo and the predicted bounding boxes alongside.

    We can see that the model has done well on these examples, finding all of the kangaroos, even in the case where there are two or three in one photo. The second photo down (in the right column) does show a slip-up where the model has predicted a bounding box around the same kangaroo twice.

    Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

    Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

    A second figure is created showing five photos from the test dataset with ground truth bounding boxes and predicted bounding boxes.

    These are images not seen during training, and again, in each photo, the model has detected the kangaroo. We can see that in the case of the second last photo that a minor mistake was made. Specifically, the same kangaroo was detected multiple times.

    No doubt these differences can be ironed out with more training, perhaps with a larger dataset and/or data augmentation, to encourage the model to detect people as background and to detect a given kangaroo once only.

    Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

    Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

    Further Reading

    This section provides more resources on the topic if you are looking to go deeper.

    Papers

    • Mask R-CNN, 2017.

    Projects

    • Kangaroo Dataset, GitHub.
    • Mask RCNN Project, GitHub.

    APIs

    • xml.etree.ElementTree API
    • matplotlib.patches.Rectangle API
    • matplotlib.pyplot.subplot API
    • matplotlib.pyplot.imshow API

    Articles

    • Splash of Color: Instance Segmentation with Mask R-CNN and TensorFlow, 2018.
    • Mask R-CNN – Inspect Ballon Trained Model, Notebook.
    • Mask R-CNN – Train on Shapes Dataset, Notebook.
    • mAP (mean Average Precision) for Object Detection, 2018.

    Summary

    In this tutorial, you discovered how to develop a Mask R-CNN model for kangaroo object detection in photographs.

    Specifically, you learned:

    • How to prepare an object detection dataset ready for modeling with an R-CNN.
    • How to use transfer learning to train an object detection model on a new dataset.
    • How to evaluate a fit Mask R-CNN model on a test dataset and make predictions on new photos.

    Do you have any questions?
    Ask your questions in the comments below and I will do my best to answer.

    The post How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras) appeared first on Machine Learning Mastery.

    Source link

    Click here to read more

    [D] Which Machine Learning algorithm should use?

    [D] Which Machine Learning algorithm should use?

    A simplified cheat sheet.

    https://i.redd.it/3x4aiworft031.jpg

    More AI / ML Slides

    submitted by /u/seemingly_omniscient
    [link] [comments]

    Source link

    Click here to read more