Overview

In this project, we aim to build a machine learning model that can identify the gender of a person from their voice recording. In the process, we use two intermediary data representation format of the audio clips- Mel Spectrogram (Mel) and Mel-Frequency Cepstral Coefficients (MFCC).

Datasets

[MCV] Common Voice by Mozilla.org (https://www.kaggle.com/datasets/mozillaorg/common-voice)

[DLS] Bengali Common Voice Speech Dataset (https://www.kaggle.com/competitions/dlsprint)

Proposed Solution

Mel-Frequency Cepstral Coefficients (MFCC)

Mel Spectrogram

Notebook Details

Training

The training folder contains four notebooks. Each of the notebooks are named as: [Data-Type]_[Dataset]_[Model]. These notebooks are used to train individual models on the train datasets.

└── training
    ├── mel_dls_resnet50_train.ipynb 
    ├── mel_mcv_resnet50.ipynb       
    ├── mfcc_dls_train_resnet50.ipynb
    └── mfcc_mcv_resnet50.ipynb

Evaluation

The evaluation folder contains four notebooks. Each of the notebooks are named as: [Data-Type]_[Datase#1]_on_[Dataset#2]. The models trained on Dataset#1 are used to evaluate Dataset#2.

└── evaluation
    ├── mel_dls_on_mcv.ipynb
    ├── mel_mcv_on_dls.ipynb
    ├── mfcc_dls_on_mcv.ipynb        
    └── mfcc_mcv_on_dls.ipynb

In the report mentioned in the presentation, the comparison between models are shown.

Model Details

Architecture: ResNet50
Learning Rate: 0.0001
Adam Optimizer

Presentation Report

https://docs.google.com/presentation/d/14BWOq6YSmO3GqZHEvCou43Z5A4dlOmKq4pjqUqgZALU/

References

[1] Speaker Gender Recognition Based on Deep Neural Networks and ResNet50 (https://doi.org/10.1155/2022/4444388)

[2] A Machine Learning Approach to Automating Bengali Voice Based Gender Classification (https://ieeexplore.ieee.org/document/9117385)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
evaluation		evaluation
scripts		scripts
training		training
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Datasets

Proposed Solution

Mel-Frequency Cepstral Coefficients (MFCC)

Mel Spectrogram

Notebook Details

Training

Evaluation

Model Details

Presentation Report

References

About

Languages

fazledyn/gender-classification-from-audio-clips

Folders and files

Latest commit

History

Repository files navigation

Overview

Datasets

Proposed Solution

Mel-Frequency Cepstral Coefficients (MFCC)

Mel Spectrogram

Notebook Details

Training

Evaluation

Model Details

Presentation Report

References

About

Topics

Resources

Stars

Watchers

Forks

Languages