Animal Image Recognition Challenge

About the project

The objective of this project is to create an animal image recognition model using supervised learning. To do so, we have at our disposal annoted images from the ImageNet Database, available for a Kaggle challenge.
The chosen approach to accomplish this objective is to build an ontology of all the animals and their morphological features by extracting data from WikiData, and then train the model on the ontology.
This project is a UTBM student project realized as a part of the UV DS51.

Getting Started

The project is fully written in python. Here are the modules to install :

SPARQLWrapper
nltk (with WordNet version 3.1 or higher)
tqdm
pandas
xmltodict
rdflib
sklearn
cv2

We organized the project as a pipeline generating the image recognition model step by step. The input of the pipeline is a file containing a list of synsets mapped to their ImageNet ID. By default, there are 2 different pipeline inputs:

All of the synsets from the Kaggle challenge
A selection of 6 synsets, used as a proof of concept (POC) of the project

To run the pipeline on the POC synsets, execute the main.py file. As executing the pipeline generates a lot of calls to the WikiData API per synset, running it on all of the Kaggle challenge synsets at once won't be possible for API limitations reasons. Therefore, on the full dataset, the pipeline needs to run step by step, some steps multiple times, and digging into the code is inevitable. As an improvement of this project, a command line interface would prevent that need.

The input of the pipeline is a file named LOC_synset_mapping.txt. When running the pipeline, the following files are generated:

The ontology file isn't in the repository, but it can easily be generated from the 3 first files. If you want to rerun a specific step of the pipeline, delete the file it generates and run the function of the step.

Pipeline description

This description of this pipeline is focused on how the pipeline works, and not on how it was implemented or which function represents which step. If you are interested by that, check the full_commented_pipeline function in the main.py file.

Synset WikiData mapping

In order to build an ontology based on data from WikiData, we first have to find the entities in WikiData that match each one of our synsets. To do so automatically, there are 2 WikiData properties which can be used:

Exact match (P2888)
WordNet 3.1 Synset ID (P8814)

The main issue of this step is that the synset ID that we have is outdated in WikiData since 2011, and the release of the version 3.1 of WordNet, in which all of the synset IDs were modified. For more clarity, from now on, we will call the older (3.0 and lower) WordNet ID ImageNet ID (or inid) and the WordNet 3.1 ID WordNet ID (or wnid). WikiData ID will also be refered to as wdid.

The exact match property still contains some ImageNet ID, but it doesn't even represent half of the Kaggle Challenge synsets. Therefore, the first step is to find the WordNet ID of all of our synsets. As WordNet can be installed as a python module from the Natural Language ToolKit (nltk), and as synsets weren't modified from version 3.0 to 3.1 (only their IDs), synsets can be directly looked up there. By default, nltk comes with an older version of WordNet, therefore the 3.1 WordNet database has to be downloaded separately and replaced in the nltk WordNet files.

Once this mapping is done, there still are over 200 synsets without WikiData match. Therefore, for these ones, a user controlled mapping takes place. The lemmas of the synsets are searched in the WikiData labels and aliases and the description of each match is displayed to the user, which allows him to select the best match.

The last part of this step is to remap the entities that are common names of other entities (p:P31/pq:P642 property) and aren't linked to any parent entity. In those cases, the common name of property is the only link of the object to other WikiData entities, and in SPARQL, going through the property before doing the required query times out every time.

The result of the mapping is stored as a list of dictionnaries in a JSON file named synset_mapping.json. Each synset dictionnary looks this way:

{
    "inid": "n02114367",
    "wnid": "02117019-n",
    "synset": [
        "timber wolf",
        "grey wolf",
        "gray wolf",
        "Canis lupus"
    ],
    "wdid": "Q18498"
}

Build an Animal graph structure

The objective of this step is to extract from WikiData a tree structure with as root the Animal entity and as leaves the WikiData entities of our synsets. Intermediate nodes of the graph will be obtained by going through the WikiData properties subClassOf (P279), instanceOf (P31) and parentTaxon (P171).

We started by identifying for every WikiData entity in our dataset, if it exists, the pattern of properties to go from the entity to the animal entity. The last step allowed us to connect 383 animals of the Kaggle challenge dataset to their WikiData entity. These animals have 4 different patterns to the animal entity:

subclass (wdt:P279* wd:Q729): recursive animal sub-class (75 animals, mostly dog breeds)
taxon (wdt:P171* wd:Q729): recursive animal sub-taxon (245 animals)
subclass_instance (wdt:P31 [wdt:P279* wd:Q729]): instance of a recursive animal sub-class (56 dog breeds, 1 rabbit breed)
subclass_taxon_subclass (wdt:P279 [ wdt:P171* [wdt:P279* wd:Q729]]): sub-class of a class taxon recursively linked to a recursive animal sub-class (6 animals)
None: entities with no animal pattern can then be considered as not animals

These patterns are saved as new values in the dictionnaries of the file synset_mapping.json. To build the graph, it also is required to add the label of each entity. Indeed, in order to clarify the graph, the labels of the entities are used to identify them instead of their WikiData ID. The final of each synset dictionnary is then the following (its final form):

{
    "inid": "n02114367",
    "wnid": "02117019-n",
    "synset": [
        "timber wolf",
        "grey wolf",
        "gray wolf",
        "Canis lupus"
    ],
    "wdid": "Q18498",
    "label": "Wolf",
    "animal_pattern": "taxon"
}

From these animal patterns, we can now deduce the path to the animal entity for each animal. If building the graph is pretty straight forward for animal the subclass and subclass_instance patterns (every intermediate parent class becomes a node in the graph), it becomes trickier for taxon and subclass_taxon_subclass patterns. Indeed, the taxon path is much more complex, and putting all of the parent taxons as nodes of the graph produces a very messy graph.

Some of the taxon parents actually are also subclasses of the animal entity. Therefore, the first solution for this problem was to identify the nodes in both graphs and and add them to our graph. The problem of this solution is that it doesn't take into account a lot of important classes, and that a lot of animals end up being only tied to the animal entity in our graph. For example, the Birds entity was never detected, as it is a subclass of Vertebrata which isn't directly a subclass of Animal, as it is its common name entity, Vertebrate, which is a subclass of Animal.

The final solution was to take into account all of the entities in the parent taxon path which have a parent class, with the condition that this parent class is still connected to the animal entity via a taxon path.

Here is the generated graph on the POC dataset, in which all of the 6 synsets have a taxon animal pattern: As you can see, all of the entities find a logical path to the animal class, except the entity goldfish, who only finds its way to the vertebrate entity. It still is a much more than acceptable result.

The graph is saved in a CSV file named graph_arcs.csv as a list of parent/child couple of nodes, including the label of each node. This list represents the arcs of the graph. The ontology will be built on this graph structure, using the labels as node ID.

Ontology creation

Ontology structure

The structure of the ontology is composed of 2 main parts: the subclass graph structure and the definition of the morphological features. From the graph_arcs.csv file, the first part is pretty straight forward. For each node of the graph, a new class is created with the label of the entity as identifier. Then, for each arc, a triplet (child rdfs:subClassOf parent) is created. After the creation of all the arcs, the integrity of each subclass triplet is checked. If the parent of the triplet is also a parent of another class of the child, the triplet is removed. Each class also gets attributed their WikiData ID, and for those which have one, their ImageNet ID.

There is no kind of morphological features in WikiData. Therefore, the morphological features have to be initialized in another way. All of them are listed in a JSON file named animal_features.json, in which the animal labels are the keys and the features of the animals are the values in a list of string. Here is an extract of this file for the wolf class and its parent classes:

{
    "Vertebrata": ["Backbone", "Spinal cord"],
    "Quadruped": ["Four legs"],
    "Tetrapoda": ["Four limbs", "Lungs"],
    "Carnivora": ["Sharp teeth", "Claws"],
    "Caniformia": ["Dog-like shape"],
    "Canidae": ["Muzzle", "Pointed ears"],
    "Canis": ["Pointed snout", "Sharp teeth"],
    "Wolf": ["Canine teeth", "Bushy tail"]
}

If this file can be filled manually, we chose to use ChatGPT to generate it. The print we used was the following:

For all the following animal categories, give a list of morphological features that represent it. Each feature has to be one word :  
    Vertebrata : 
    Quadruped : 
    Tetrapoda : 
    Carnivora : 
    Caniformia : 
    Canidae : 
    Canis : 
    Wolf :   

Some of these aren't morphological. Focus only on morphological features

It is more and more common to see supervised learning models trained with AI generated content, and in our case it seems like a good way to proceed.

All of the features are then defined as instances of a class MorphFeature, and added as features of the matching class and all of its subclasses. For example, here is the generated definition of the Wolf class:

ac:Wolf a rdfs:Class ;
    ac:hasMorphFeature 
        ac:backbone, ac:bushyTail, ac:canineTeeth, ac:claws, ac:dogLikeShape,
        ac:fourLegs, ac:fourLimbs, ac:lungs, ac:muzzle, ac:pointedEars,
        ac:pointedSnout, ac:sharpTeeth, ac:spinalCord ;
    ac:inid "n02114367" ;
    ac:wdid wd:Q18498 ;
    rdfs:subClassOf ac:Canis .

This step could be majorly improved, especially by creating sub-classes of the MorphFeature class, correspunding to different parts of the body (furr, mouth, legs, ears, tail, etc...) and instances of these classes being a specific feature of this part of the body (size, color, etc...). This would most likely be a huge help for the supervised learning model, but with a large amount of classes, it would be much harder to match the features to the animals.

Once the ontology structure is generated, it is saved in the animal_ontology_structure.ttl file. It then is possible to visualize it using ontology editing tools, like Protégé.

Ontology population

Now that the ontology structure is created, to finalize its creation, it needs to be populated. The population is made of individual animals which appear on images from the ImageNet database. As explained earlier, a part of this database has been made public for a Kaggle challenge.

Populating the ontology starts by downloading the images dataset. To do so:

Log in to Kaggle
Go to the challenge rules page and accept the rules
Go to your account page and generate an API token
Place the generated kaggle.json file in the directory of the project
Execute this command: kaggle competitions download -c imagenet-object-localization-challenge

Once the zip file is downloaded, the program unzips the resources of the animals in the ontology, and splits the resources into a testing and a training dataset. The ontology will be populated with the training dataset, while the testing one will be used to evaluate the performances of the model. Considering the amount of resources available, 1% of the resources being used for testing is considered to be enough.

The downloaded dataset contains two kinds of resources: images and annotations. Annotations are XML files that detail the content and properties of an image. Not all images are annoted. The relevant features of the annotation files are:

The size (width, height) of the image
a list of entities in the images.

Each entity is an animal of the class present on the image. Each object contains a bounding box, which is a list of 4 integers (xmin, ymin, xmax, ymax) which represent the rectangle (in pixels) in which the animal appears on the image.

Here is how the images and animal instances are represented in the ontology:

ac:n02114367_10043 a ac:Wolf ;
    ac:boundingBox [ ac:xMax 350 ; ac:xMin 4 ; ac:yMax 276 ; ac:yMin 82 ] ;
    foaf:img ac:IMG_n02114367_10043 .

ac:IMG_n02114367_10043 a schema:ImageObject ;
    schema1:image [ a schema:URL ;
            schema:value <file:///{project_path}/Data/Images/Train/n02114367/n02114367_10043.JPEG> ] ;
    ac:size [ ac:height 347 ; ac:width 500 ] .

An instance of the animal class is created per object in the declared in the Annotations file. It has as property the image on which it appears and (if an annotation file exists for the image) the bounding box in which it appears. The image is defined as an instance of the class schema:ImageObject having as parameters the absolute path of the file and (if the image is annoted) its size in pixels.

As the images aren't available online resources, the populated ontology isn't easily sharable, which is why it isn't available in this repository. Once created, the ontology is saved into a file named animal_ontology.ttl.

Model training

There are multiple ways to train a model using the ontology we created. We chose a way that only uses the ontology structure and not its population. Another model, built using the entire structure of the ontology, would probably produce a better result.

The model we created only relies on the morphological features of the animals. The features are listed for each leaf animal class in a boolean matrix. Here is what that matrix looks like with the main features of the POC dataset.

All of the images are processed in a way that only keeps 512 numerical values per image. These processed images are placed with their labels (ImageNet ID, name of the image directory) in 2 DataFrames: one for training and one for testing. Then, a prediction of the morphological features of each image is made by going through the following process:

Extract one column of the morphological features matrix
Map that column to the target of the image DataFrame using the target of the morphological matrix to create a new image target. For example, if the value of the Beck column for object n01614925 (Bald Eagle) is True, then all of the images of bald eagles in the image dataset will have as new target True
Train a classifying model with as features the images and as target the new target column
Using the trained model, predict the boolean value of this morphological feature for each image of the training dataset
Redo these steps for every feature of the morphological features matrix

This generates a prediction matrix which is saved in the features_prediction.csv file. From this matrix, all it takes is to train another model based on the class morphological features DataFrame, and predict each training image's label from the prediction matrix. By default, this model is a Multi Layer Perception (MCP) classifier.

On the POC dataset, this method isn't that efficient, and has a 0.80 accuracy, which is less than what could be achieved with a model only using the images and their labels. It can be explained by 2 reasons:

The intermediate predicted dataset generates some noise
As there are very few animal classes in the POC dataset, most of the features only are applied to one animal. Therefore, training a model on this feature is pretty similar to training a model directly on the labels. With that few features, there is no way the model could understand what each feature actually represents.

There are many ways to improve this model.

As mentioned earlier, a model based on the entire ontology would most likely have better performances.
Having more detailed, more precise, more linked features would help the model understand the similarities and differences between animals.
The extraction of numerical features from the images could be improved.
Other models than the MCP might prove to be more efficient.

This project provides a template pipeline in which steps can easily be modified and improved. The tools and data it contains would make it easier to look for the best image recognition model.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Data		Data
Exports		Exports
Tools		Tools
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
animal_graph.py		animal_graph.py
animal_patterns.json		animal_patterns.json
main.py		main.py
model_training.py		model_training.py
ontology.py		ontology.py
synset_mapper.py		synset_mapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Animal Image Recognition Challenge

About the project

Getting Started

Pipeline description

Synset WikiData mapping

Build an Animal graph structure

Ontology creation

Ontology structure

Ontology population

Model training

About

Releases

Packages

Languages

Molrn/animal-image-ontology

Folders and files

Latest commit

History

Repository files navigation

Animal Image Recognition Challenge

About the project

Getting Started

Pipeline description

Synset WikiData mapping

Build an Animal graph structure

Ontology creation

Ontology structure

Ontology population

Model training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages