loading page

Name based Gender Identification using Machine Learning and Deep Learning Models
  • Ritwick Ghosh
Ritwick Ghosh
Indian Institute of Engineering Science and Technology

Corresponding Author:[email protected]

Author Profile

Abstract

Name of a person reflects a lot about his origin, gender and more. Naming is a fragment of language and can be an identifier as it contains patterns that could be learned in order to classify individual’s gender based on their names (first name and middle name). In this paper a multi-national, multi-ethnicity and multi-language names (first name) based classification or guessing of gender is per-formed by popular Machine Learning and Deep Learning models. Recent advancements of various Artificial Intelligence inspired algorithms offer various options for text classification. This paper investigates first name based gender identification on a dataset of 84,899 first names and middle name initials with gender labels. The algorithms tested on this task are Support Vector Classifications (linear and non-linear), Nearest Neighbors, Decision Tree, Random Forest, Multi-layer Perceptron, Long Short-Term Memory (LSTM), Bidirectional LSTM, Convolutional Neural Network and a simple embedding based deep learning model (SimpleText). Among the models the significant results out of this study is that linear model (94.62%) performs equivalent with the complex machine learning models and the simple embedding based deep learning network SimpleText outperforms (94.67%) all the models, even the other Recurrent and Convolutional networks