Abstract
Deep-learning models estimate values using backpropagation. The
activation function within hidden layers is a critical component to
minimizing loss in deep neural-networks. Rectified Linear (ReLU) has
been the dominant activation function for the past decade. Swish and
Mish are newer activation functions that have shown to yield better
results than ReLU given specific circumstances. Phish is a novel
activation function proposed here. It is a composite function defined as
f(x) = xTanH(GELU(x)), where no discontinuities are apparent in the
differentiated graph on the domain observed. Four generalized networks
were constructed using Phish, Swish, Sigmoid, and TanH. SoftMax was the
output function. Using images from MNIST and CIFAR-10 databanks, these
networks were trained to minimize sparse categorical crossentropy. A
large scale cross-validation was simulated using stochastic Markov
chains to account for the law of large numbers for the probability
values. Statistical tests support the research hypothesis stating Phish
could outperform other activation functions in classification. Future
experiments would involve testing Phish in unsupervised learning
algorithms and comparing it to more activation functions.