Phish: A Novel Hyper-Optimizable Activation Function

Philip Naveen

doi:10.36227/techrxiv.17283824.v2

loading page

Phish: A Novel Hyper-Optimizable Activation Function

Philip Naveen

Abstract

Deep-learning models estimate values using backpropagation. The activation function within hidden layers is a critical component to minimizing loss in deep neural-networks. Rectified Linear (ReLU) has been the dominant activation function for the past decade. Swish and Mish are newer activation functions that have shown to yield better results than ReLU given specific circumstances. Phish is a novel non-monotonic activation function proposed here. It is a composite function defined as f(x) = xTanH(GELU(x)), where no discontinuities are apparent in the differentiated graph on the domain observed. Four generalized networks were constructed using different activation functions. SoftMax was the output function. Using images from MNIST and CIFAR-10 databanks, these networks were trained to minimize sparse categorical crossentropy. A large-scale cross-validation was simulated using stochastic Markov chains to account for the law of large numbers for the probability values. Statistical tests support the research hypothesis stating Phish could outperform other activation functions in classification. Future experiments would involve testing Phish in unsupervised learning algorithms and comparing it to more activation functions.