Abstract
Deep-learning models estimate values using backpropagation. The
activation function within hidden layers is a critical component to
minimizing loss in deep neural-networks. Rectified Linear (ReLU) has
been the dominant activation function for the past decade. Swish and
Mish are newer activation functions that have shown to yield better
results than ReLU given specific circumstances. Phish is a novel
non-monotonic activation function proposed here. It is a composite
function defined as f(x) = xTanH(GELU(x)), where no discontinuities are
apparent in the differentiated graph on the domain observed. Four
generalized networks were constructed using different activation
functions. SoftMax was the output function. Using images from MNIST and
CIFAR-10 databanks, these networks were trained to minimize sparse
categorical crossentropy. A large-scale cross-validation was simulated
using stochastic Markov chains to account for the law of large numbers
for the probability values. Statistical tests support the research
hypothesis stating Phish could outperform other activation functions in
classification. Future experiments would involve testing Phish in
unsupervised learning algorithms and comparing it to more activation
functions.