Lightweight Neural Networks for Context Aware Embedded System

— An embedded system is a microcontroller or microprocessor-based system which is designed to perform a specific task by collecting, processing and communicating information. While focusing on specific task, it is also desired to make such system for better and efficient result. In due course, one of the challenges is contextualizing the collected information to predict the output and making smart decision to produce the output. The learning system that can contextualize the surrounding environment should have a capability of automatic mechanism of inferring information like humans do. This calls for neural networks that provide an embedded intelligence for smart systems to make decisions at machine speed. The main challenge to develop such system is the constraints in memory size, computational power and other characteristics of embedded system that can significantly restrict developers from implementing learning algorithms to solve the problem. This paper resents lightweight neural networks so as to show a method for implementing context-aware embedded system in environment where there is resource limitation. A testbed is setup for collecting the data, training and evaluation. The algorithms are simulated using C on Arduino. A good result was obtained after deploying the algorithm and knowledgebase on Arduino board for sensor reading.


INTRODUCTION
mbedded system is a system which is often implemented in microcontroller to perform a specific task. It usually contains input receptors, a processor that process the input data or what has been stored in the memory and/or process the input, and an actuator that makes an output based on the processed inputs as per the embedded code in the device. The use of embedded system is multi-dimensionally ranging from industrial machines to home appliances. It can also play significant role in building smart such as smart cities computing environment that mainly rely on context awareness [4].
Context awareness enables a system to understand and react to certain conditions based on certain contextual information. Such system often uses embedded devices to capture the surrounding information. Context-aware embedded system is beyond self-aware computing in which self-aware computing deals with making a system knowledgeable about its internal operations, states and process whereas, context aware system is incorporating global-level awareness about the system and its environment [5]. Various algorithms need to be applied in embedded systems to make it context aware system. The existing embedded system algorithms consist of mostly rule based algorithms such as expert systems. These algorithms lack flexibility and adaptability properties, which are desirable in dynamic context aware environments [33]. This is emanated from the fact that these algorithms decide the output or actuation based on hardcoded rules, which cannot be changed dynamically. In this regard, neural networks have been applied to learn from input data to perform decisionmaking in dynamic settings that require actuations.
Neural network is inspired by human computational activity [35]. It has ability to work in imprecise, uncertain and noisy environment. Neural network can learn, adapt and generalize. It can also classify pattern that has not been before [30]. However, neural networks require extensive processing, storage and communication resources to be directly applied in resource-constrained embedded systems because of higher space complexity of the model. This inculcates that there is a need to design a novel lightweight neural network model that can fit to the resource requirements of embedded systems in dynamic context aware environments.

RELATED WORK
Since the early 90, ML has been applied to support different frontage of context aware systems [12]. However, this work is dedicated to explore distinct applications of neural networks for context-aware embedded systems. Bayesian network to model and reason about the uncertain contexts was received much attention by the context-aware research community [17]. Gu et al. [30] represented a particular Bayesian network in the Ontology Web Language (OWL) based ontology and then translated into a Bayesian network for reasoning. They propose a probability extension to an ontology-based model for representing uncertain contexts; and use Bayesian networks to reason about uncertainty. In addition, they have incorporated the supports of probabilistic markups and Bayesian networks into context-aware middleware system to enable the building of context-aware. They have represented contexts as first-order predicate calculus and described structures and properties of context predicates in ontology. They show Bayesian network is a powerful enough for reasoning about causal relationships between various uncertain contexts. However, such approaches have imperfect property about uncertain result of context awareness problem and none of them consider the issue of reusing the uncertain knowledge captured by Bayesian networks even though ontology was used. The knowledge captured by a particular Bayesian network is fixed and distinctive for a particular application E so that it is unable to share and reuse between applications. According to the authors, relatively low adaptation effort is needed to apply BN in embedded system with small number of input, but for the embedded system to react properly according to the context a number of inputs (usually sensors)are required.
Yavari [1] proposed a means to transform context obtained from environment to develop embedded system-enabled awareness services by storing contextualized data in separate server. The author proposed an approach and techniques for performing internet-scale data contextualisation. In particular, IoT-based contextualisation techniques that effectively considers the entire range of data that is being collected in smart cities and use such data to provide information that best suits the context of each user in the Smart City was proposed. The author exemplifies the proposed contextualisation solution in a smart parking space recommender service. They have evaluated if contextualisation helps process driver queries faster and can handle Internet-scale IoT data sets. They achieve better contextualisation performs when the amount of data increases towards the Internet scale. But the system must always communicate with the server before making action based on situation. This adds another task to the system and it's difficult to provide real-time output. Moreover, functionality of the system is dependent on functionality of the server and it cannot provide contextualised decision when there is a failure in server. Ahmad Faridi [28] have applied HMM to contextualize activities in context aware system. The author proposed the use of HMM to predict and infer the context of the user based on the location of the target, the activity user is doing, identity of the target and the time in which the activity occur. The author has shown that by using HMM it is possible to predict a user's next state given the current state and previous states. Although HMMs are recognized as one effective technique for activity classification, because they offer dynamic time warping, have clear Bayesian semantics and well-understood training algorithms, its computational expense (both memory and time complexity) and floating-point representation of the parameters makes it very prone to numerically underflow. This makes it less likely feasible to implement it in context aware embedded system for two main reasons. First, such an algorithm with high computational expense is not fit to be implemented in resource limited embedded device. Second, usually many activities are performed in a dynamic environment and some of the activities are short lived. The time of switching some data might be lost and handling of floating numbers in such a dynamic environment has an effect on smart prediction of the output. The authors have substituted floating-point operation by integer operation for parameter representation to implement forward algorithm for motion classification on embedded system. However, it still causes intermediate results to overflow. The running time of forward algorithm is O (K2N) where, K is number of states and N is number of observation. This time complexity is huge for embedded system. Daniel [31] presented an approach of modelling contextual information of a context-aware system using the example of a 'context-aware in-car infotainment system'. In particular, the author showed how developers of context-aware in-car infotainment systems can model reliability calculations of contextual information and handling of multiple sources of contextual information by using ontology-based modelling technique. It allows complex reasoning and presentation with more meaningful results. Its main weakness is that data should be in OWL file format and also this method has low performance. Ning [32] proposed middleware architecture for context processing in IoT. The architecture is based on fuzzy logic control (FLC) system for context reasoning. The author proposed a formal context representation model in which a user's context is described by a set of roles and relations correspond to a context space. The algorithm is simple, easy to extend, requires less resources and represented in natural language. However, the algorithm requires knowledge of human expert that mimic his/her thinking and as a result it is prone to have manual entry mistakes from the expert that uses the algorithm. But a context aware embedded systems can take advantage of fuzzy logic if it is combined with neural network to process linguistic information along with other benefits. Hongyuan W.
[33] applied rule induction for context aware adaptation by taking emergency situation as a scenario. According to the author, the "rule has respective meaning for system, software and context". "Rules" usually take the form: Condition -> Action "Adaptation" rules take the form: Contextual condition -> Software system action Rules for "Context" state change would be: Condition and action -> Context state after change The algorithm is the simplest and straight forward with less resource requirements when compared to HMM, BN and NN. But mistakes can be made during writing the rule and wrong rule will result in bad output prediction. Another model which can be applied in context aware embedded system development is neural network that have rapidly gained popularity for it success in image recognition, natural language processing, and other application areas [4,35,36]. Bashyal et al. [13] created an embedded neural network for fire classification. It has seven inputs and three outputs. Each of the seven inputs is a homogeneous sensor and the three outputs represent the three types of fires to be classified: No Fire, Class A, and Class B, where class A is high and class B is low fire detected. This network is very large relative to the simplicity of the problem to be solved. It could have been done using simple logic. It is not clear to know whether the entire network is working or a small number of neurons are carrying the load. It is also good to notice that the numbers of layers and nodes affect the time complexity of the use of a neural network. Another reason why additional hidden layers seemed to be a problem was that they would require a very extensive training set to be able to compute weights for the network. Unlike specific applications, many different sensors are required for context aware systems. So, we need to have optimized technique so that we only use one sensor once. For example, Bashyal et al. [13] have used a homogeneous gas sensors seven times; this is not possible for context aware systems which require various heterogeneous sensors to collect information from an environment. In addition when multi sensors are used some sensors can increase the activation and others may decrease it. This cannot be considered when similar sensors are used and the case has not also been taken into account in other neural network applications that use sensors. Based on the analysis of the related works presented above, we notice that embedded systems can use advantage of machine learning models ability of learning by example, self-learning and self-organizing based on example and processing logical information to be a context aware system. We also notice that computational expense is an issue in adapting machine learning models to embedded system. In addition it is also noticed that there is a gap in using efficient power of neural networks in adapting it in complex embedded system applications that require smart decisions. This paper focuses on filling the gap in using heterogeneous input sensors and deploying neural networks on resource limited embedded devices.

Architecture
In the domain of context awareness, a technology cannot be described as context aware without at least exhibiting the ability to detect a situation such as activity, condition and effect of the situation on the system. Having the required elements available, the next step would be to apply and utilize inference technique to correctly associate the combination of perceived situation to the correct context. In this case, NN is utilized at the heart of embedded systems (as shown in Figure 4.1) as our inference techniques of choice. The scenario of stove controller will be used as one of the widely used home appliances.
In the deign of neural network for context aware embedded system, there are input elements (ensors) that have direct relationship with the output (actuation) and some of them have reverse or indirect relationship. In addition to the weight of input element in input layer of a neural network, state of each input elements is passed from input layer to hidden layers. In this work, the state of input elements are classified into three: input element that has direct relationship with the output has state 1, input element that has indirect relationship with an output has state -1, and the one that is not active or has no effect on the output has state 0. To calculate the input value, parameters namely state, input value and neuron weight of each sensor will be multiplied and the sum of the result will be passed to the activation function. In this design, weight of neuron in the input layer and hidden layer is assigned by using Guassian distribution [14]. The difference between target output and actual output (the output that is calculated by activation function) is calculated to find error during training the neural network. The computations repeated by updating weight of the network until the error becomes equal to threshold error. When the total error between target and actual output becomes less than or equal to the threshold error, the output for corresponding inputs will be stored in a knowledgebase. Based on values in the knowledgebase, the firing algorithm is reacted to the environment using information from sensors.
The NN has been designed in such a way the actuation is made without training the network, which reduces computational expense (both memory and time). The firing algorithm classifies all inputs (including the untrained inputs) to actuation class is used to make the embedded system to be flexible in dynamic environments. These design choices enabled to build a learning system which can fit into the requirements of resource-constrained embedded devices in the context-aware environment.
Considering smart kitchen as a scenario, increase in weight of input from temperature sensor of a stove has direct relationship with actuation to help the decision of switching off the appliance or alerts the user, whereas input from motion sensor (which senses presence of a user) has indirect proportionality with the actuation. Change in the load of the stove has also directly proportional to the actuation and it is calculated as∆Wt = wti -wtcwhere, ∆Wt is change in load, wti is initial load and wtc is current load.If ∆W>0, the state will be 1, otherwise the state will be 0.Neural network design for the scenario is shown in   fig. 4.2, LM35, PIR and HX711are temperature, motion and load sensors respectively, while S1, S2, and S3 are sensor states (-1, 0 or 1). H1 and H2 are hidden nodes taking respective weights of w11, w12, w21, w22, w31, and w32 whereas h11w, h12w, h21w, and h22w are weights from hidden layers for the output nodes O1 and O2. The scenario has B as a bias input, f(x) as activation function, variables A1, A2 and A3 as actuation classes i.e. A1 switching off, A2 alert, A3 normal, and KB as a knowledgebase which is used to store output for trained inputs. F is firing rule algorithm which classify the contextual information into actuation class based on the knowledgebase. The broken lines represent novel design concept employed in this paper, and others are taken from neural network design [37].

Firing Rule for the proposed design
The adopted Firing rule algorithm is different from other technique such as Hamming distance technique in firing options. In the applied technique, in addition to firing and not to fire, how far the input is from firing and not firing is known and based on the distance the output can be classified into one of the intermediate classes. It takes a collection of patterns for a node, some of which cause it to fire (the 1taught set of patterns) and others which prevent it from doing so (the 0-taught set). Input patterns not in the both collections cause the node to fire in intermediate actuation   Table 4.1 a 3-input neuron is taught to output A1 when the input (X1, X2 and X3) is 111 and A3 when the input is 000. Then, by applying the firing algorithm in every column the following truth table is obtained for three actuation classes A1, A2, A3; In Table 4.1 the output is known only for two input patterns i.e 111 and 000. The output for the rest of input patterns can be found based on the two input patterns from the knowledge. However, the algorithm has taken designed to enable NN to react to dynamic environment that has not been trained.As an example of the way the firing rule is applied, take the pattern 101. It differs from 111 in 1 element and from 000 in 2 elements. Therefore, the pattern is nearest to111 patterns. To determine firing class next to either A1 or A3, the formula N-m has been used. For the three classes in the table, N=2, and the two similar patterns in classes gives m =2. Hence, intermediate firing class will be the first (3-2) class next to A1 which means class A2 will be fired. The same is true for other patterns.

Training
Training a neural network basically means adjusting all of the weights by repeating two key steps, forward propagation and back propagation. Figure 4.3 [18] shows how forward and back propagations work. propagation [18] In forward propagation, a set of weights has been applied to the input data and calculate an output. For the initial forward propagation, the set of weights has been assigned by Guassian distribution. The sensorsvalues have been used as input for the hidden layers, and outputs from hidden layers have been input for output node/s. In addition to input values, the effect of the sensor should also be known for sensors. To do these, the following procedures have been carried out as described in Algorithm 4. Finally, the sum is passed to each hidden nodes through activation function as y1 = f(H1) y2 = f(H2) where, y1 and y2 are outputs from hidden layers. y1 and y2 are outputs from hidden layers ( H1 and H2 respectively) and next it will be input for output node.
The sigmoid function only outputs numbers in the range (0,1) as shown on Figure 4.4 [18]. If z is very large, e-z will be close to 0, and therefore the output of the sigmoid will be 1. Similarly, if z is very small, e-z will be infinity and hence the output of the sigmoid will be 0.
The Algorithm 4.2 describes forward propagation algorithm.

Algorithm 4.2: Forward propagation for the proposed solution
The forward propagation algorithm has been uniquely designed while the backward propagation has been adapted from [18]. In back propagation,the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. Error is initially calculated by computing the difference between actual outputs (forward propagation )and the target output (true output)., as described in Algorithm 4.2.Error in each output nodes is calculated by mean square error (MSE) [11] as EOi =1/2(target output-actual output)2. Total error can be computed as Etotal= ∑1/2(target outputactual output)2. The main goal of the training is to reduce the total error or the difference between actual output and target output. Since target output is constant, the only way to reduce the error is to change the variable elements affecting actual value which is weight. Gradient descent is used in back propagation to update weight. It is an iterative optimization algorithm for finding the minimum error function. To decrease total error partial derivatives of error with respect to weight is computed [2] as ∂E/(∂w^kij), where E is total error and w is weight in k layer. The derivation of the error function is evaluated by applying the chain rule as ∂E/(∂w^kij) = ∂E/(∂yoi^k ) * (∂yoi^k)/(∂w^kij), where yo is an actual output in k layer. This can be further decomposed as ∂E/(∂w^kij) = (∂∑1/2 〖(target output-actual output)〗^2)/(∂yoi^k )* (∂yoi^k)/(∂w^kij). Computation of the error terms will proceed backwards from the output layer down to the input layer.
In output layer, how much a change in wij affects the total error ∂E/(∂w^kij) should be known using a chain rule ∂E/(∂w^kij) = ∂E/(∂yoi^k )*(∂yoi^k)/∂netoi * ∂netoi/(∂w^kij). Next, how much does the output of oi change with respect to its total net input should be known. The partial derivative of the sigmoid function is the output multiplied by 1 minus the output [2]yoi = 1/(1+e^(-netoi) ) , where oi is net input which is weighted sum (∂ yoi)/∂oi =yoi(1yoi). Finally, how much does the total net input of oi change with respect to wij is computed as-(∂ netoi)/∂wij.
When we put them together using equation 5 we can calculate total error with respect to all weights.
Alternatively, delta rule can be used to calculate (∂ E)/∂netoi as doi = (∂ E)/∂yoi * (∂ yoi)/∂netoi = (∂ E)/∂netoi, where, d is delta. Hence, doi= -(targetoi-actualoi) * actualoi(1 -actualoi). Therefore, (∂ E)/(∂w^kij) = doi.actualhik-1, where k is network layer. To decrease the error, this value is subtractedfrom the current weight as: New weight = old weight -(∂ E)/(∂w^kij) New wij= wij -(∂ E)/(∂w^kij) Optionally the delta is multiplied by some learning rate , which we'll set to 0.5 as wij = wij -Θ.(∂ E)/(∂w^kij). Learning rate is mainly used for controlling how much the weights of the network is adjusted with respect to the loss gradient. The actual updates in the neural network are performed after the new weights for all weights in all layers (ie, we use the original weights, not the updated weights, when we continue the back propagation in hidden layer) are obtained.
In hidden Layer, the backwards pass will continueby calculating new values for weights leading into the next hidden layer neurons. This can be done as (∂ E)/(∂w^kij)= (∂ E)/∂ah1*(∂ aHi)/∂netHi*(∂ netHi)/(∂w^kij), where, aHi is actual output in for hidden layer neuron i, netHi is net input to hidden neuron which is weighted sum leading into the neuron. A similar process is used as the output layer, differing only in that the output of each hidden layer neuron contributes to the output of multiple output neurons. For instance, aHi affects all yoi. Therefore, the (∂ E)/∂aHineeds to take into consideration its effect on the output neurons(∂ E)/∂aHi= (∑∂ Eyoi)/∂aHi, where Eyoi is error in each output neurons(∂ Eyoi)/∂yoi=(∂ Eyoi)/∂netoi*(∂ netoi)/∂yoi.

Evaluation
During training phase, the network inputs are first fed by forward propagation. In addition to the input values, states of inputs are be fed before assigning weights to inputs. The assignment of weights is taken place by using Gaussian distribution rather than random assignment as shown in Figure 4.5. This can minimize iterations for updating weight (therefore optimize resource constraints) than random assignment of weights. For instance, if 2, 5, 9, and 1 are assigned for w11, w12, w21, and w22 respectively, then by using Guassian distribution the value for w11 will be(2mean)/stdv, where stdv is standard deviation. This means having stdv=3.11 and mean=4.25, thus the value of w11 will be w11 = -2.25/3.11= -0.72. Similarly, w12= 0.24, w21= 1.52 and w22 = -1.045. Let us assign state -1 for S1 and state 1 for S2. The value for bias in all neuron is 1.If weight 4, 1, 3, and 2 are assigned for h11w, h12w, h21w, and h22w, respectively, by applying similar method in hidden layer, h11w = 1.08, h12w = -1.08, h21w = 0.36, h22w = -0.36. Now, input is fed to two neurons (in our case LM35, PIR) in input layer (we skip HX711 for simplicity) then give target outputs for all input patterns in Table 4.2. State of an input is multiplied by the input value. For instance, if 01 pattern is taken for inputs, the target output will be 11. To get the actual output, 5 procedures are carried out. First, 0 is multiplied by -1 to become 0 and 1 is multiplied by 1 to become 1. This means, the input now becomes 01, however, it's not always the same for all input patterns as a sign can sometimes be changed based on the state. Next, the signed input from neurons is multiplied by respective weights to get net input for hidden neuronnetH1 = IA*w11 + IB*w21 + B, net H1 = 0* -0.72 + 1*1.52 + 1 = 2.52, and netH2 = IA*w12 + IB*w22 + B, net H2 = 0*0.24 + 1*-1.045 + 1 = -0.045. Thirdly, actual output for hidden neuron is then calculated by sigmoid activation function as aH1 = The actual output is 0.898 and 0.457 where the target is 1 and 1 respectively. As it can be observed, there is an error in both outputs. This error should be or close to zero and this will be done by back propagation.
In calculating the total error, the error for each output neuron can be calculated as Etotal=∑ In back propagation, the goal is to adjust each of the weights in the network so that they cause the actual output to be closer to the target output. The weight is updated in backward (first weights in output layer is updated then weights that goes to hidden layers are updated).
In calculating weight in output layer, such as H11w, the extent of change in H11w should be known for the effects it canbring about in the total error, 11 . It can be calculated by delta rule as 11 = -(targeto1 -actualo1)*actualo1(1 -actualo1)*aH1.

EXPERIMENTATION
We use C programming language to program our logic that is then embedded in the Arduino microcontroller. The C program on Arduino IDE and procedures which is simulated on Proteus is documented in Annexe A. The experiment is conducted on Windows 8, 2GB RAM, core i3 laptop.
For input pattern 01 (as explained in section 4.1.2) the target output was 1 for both O1 and O2 and the actual output is 0.898 for O1 and 0.457 for O2 then after training it for the first iteration the output become 0.900 for O1 and 0.478 for O2 which is closer to the target output than the previous result therefore the error is decreased (from 0.153 to 0.141). By applying the same method, after 10 iterations actual output for O1 becomes 0.9983 for O1 and 0.9879 for O2. As number of iteration increase for training, the error will be closer to zero.
For input pattern 00 before training the network, the actual output for O1 was 0.1037 and -0.1006 for O2 and after training the network by adjusting its weight in 20 iterations actual outputs become 0.0020 and -0.0009 for O1 and O2 respectively.
Similarly, actual outputs for all patterns become closer to the target outputs as number of training cycle increase as shown in Figure 5 When we compare the result with the result we obtained from the system before the training and after training, the result after training is much closer to the target output than the result before training for all input patterns and error is decreased as a result. After training the network, the result will be stored in a knowledgebase to react to inputs using the firing rule technique.
The result obtained from testing the deployed knowledgebase on arduino board for sensor reading are presented in Table 5.2. Two sensors was chosen for recognizing user existence (PIR sensor) and appliance hot level (LM35) .

CONCLUSION
In this paper, lightweight neural network is used to solve context aware problems in embedded systems that can work for many applications. Information from sensors was recognized and it's passed to hidden layer with its effect to on the output. The training was done on high capacity computer before it is deployed on embedded device. Trained inputs and its respective output were stored in a knowledgebase. Then a knowledgebase along with the firing algorithm were deployed on embedded system for making smart decision based on contextualized inputs. Stove controller was used as scenario to evaluate the model. Three different sensors (temperature sensor, motion sensor and load sensor) were used for evaluating the training. The experimental result shows precise contextualization of inputs through tunning several parameters. Two heterogeneous sensors (temperature sensor and motion sensor) were used to validate the effectiveness of the model after knowledgebase and firing rule is deployed on the embedded device. Based on information from these sensors, input were contextualized and classified to correct actuation class in a real time. In general the NN proposed in this work can be applied in all embedded systems and other IoT applications that require context awareness. In addition, the work solves restriction of applying NN in resource limited embedded systems. However, neural network is limited to modelling complex problem solving numerically and it cannot model complex problems linguistically using natural language processing. For the future work we plan to use fuzzy logic with neural network to take advantages of both fuzzy logic and neural networks; leading to context aware embedded systems that can: mimic the human decision-making process, handle vague information, learn by example (hence do not require the knowledge of a human expert) and process numeric, linguistic, or logical information.