TechRxiv
Phish__A_Novel_Hyper_Optimizable_Activation_Function.pdf (438.68 kB)
Download file

Phish: A Novel Hyper-Optimizable Activation Function

Download (438.68 kB)
preprint
posted on 21.12.2021, 15:58 by Philip NaveenPhilip Naveen
Deep-learning models estimate values using backpropagation. The activation function within hidden layers is a critical component to minimizing loss in deep neural-networks. Rectified Linear (ReLU) has been the dominant activation function for the past decade. Swish and Mish are newer activation functions that have shown to yield better results than ReLU given specific circumstances. Phish is a novel activation function proposed here. It is a composite function defined as f(x) = xTanH(GELU(x)), where no discontinuities are apparent in the differentiated graph on the domain observed. Four generalized networks were constructed using Phish, Swish, Sigmoid, and TanH. SoftMax was the output function. Using images from MNIST and CIFAR-10 databanks, these networks were trained to minimize sparse categorical crossentropy. A large scale cross-validation was simulated using stochastic Markov chains to account for the law of large numbers for the probability values. Statistical tests support the research hypothesis stating Phish could outperform other activation functions in classification. Future experiments would involve testing Phish in unsupervised learning algorithms and comparing it to more activation functions.

History

Email Address of Submitting Author

phjljp00@gmail.com

Submitting Author's Institution

Godwin High School

Submitting Author's Country

United States of America

Usage metrics

Licence

Exports