Data Caching at Fog Nodes Under IoT Networks: Review of Machine Learning Approaches

IoT devices (wireless sensors, actuators, computer devices) produce large volume and variety of data and the data produced by the IoT devices are transient. In order to overcome the problem of traditional IoT architecture where data is sent to the cloud for processing, an emerging technology known as fog computing is proposed recently. Fog computing brings storage, computing and control near to the end devices. Fog computing complements the cloud and provide services to the IoT devices. Hence, data used by the IoT devices must be cached at the fog nodes in order to reduce the bandwidth utilization and latency. This chapter discusses the utility of data caching at the fog nodes. Further, various machine learning techniques can be used to reduce the latency by caching the data near to the IoT devices by predicting their future demands. Therefore, this chapter also discusses various machine learning techniques that can be used to extract the accurate data and predict future requests of IoT devices.


Introduction
In the recent years, small devices embedded with sensors produce large amount of data by sensing the real time information from the environment. The network of these devices communicating with each other is recognized as IoT (Internet of Things) sometimes called as Internet of Everything [1]. The data produced by the IoT devices need to be delivered to the users using IoT applications after processing and analyzing. Further, data produced by the IoT devices is transient which means that generated data has certain lifetime and after that lifetime the data become useless and hence discarded [2]. Therefore, it is required to store the data somewhere near to the IoT devices [3]. At the same time, if data produced by the IoT devices is stored at the cloud server then it adds communication overhead, as the IoT users need to contact to the cloud server whenever they require any data.
Fog Computing is a decentralized approach to bring the advantages and intelligence of cloud computing such as storage, applications and computing services near to the end devices somewhere between the cloud and the end devices [4,5]. Fog nodes can be anything such as servers, networking devices (routers and gateways), cloudlets and base stations. These nodes are aware of their geographical distribution as well as the logical location in the cluster. They can operate in centralized or in distributed manner and can also act as stand-alone device. These nodes receive inputs from the data generators (IoT devices), process it and provide transient storage to the data.
Fog nodes are intelligent devices which also decide that what data to store and what to send at the cloud for historical analysis. These devices can be either software or hardware, arranged in a hierarchy and are used for filtering of data send by the sensors devices. These devices should have less latency, high response time, optimal bandwidth, optimal storage and decision making capability. At the fog nodes, intelligent algorithms are embedded for the process of storing of data, computing and forwarding of data between various layers. The member function of fog node in fog-cloud network is depicted in figure 1.1. In this figure, the compute module is responsible for the processing of data and computing the desired result. The storage module is responsible for storing data reliably so that the robustness can be achieved. Further, various accelerator units such as Digital Signal Processors, Graphics Processing Units etc. are used in critical tasks in order to provide additional power whereas the network module is responsible for the guaranteed delivery of data. Fog computing only complements the cloud computing by providing short term analytics unlike cloud computing which provide long term analytics. However, it is to be mentioned that fog computing does not replace the cloud computing [6]. There are majorly six characteristics which differentiate fog computing from other computing paradigms [7,8].
a) Awareness and Low Latency: Fog nodes are aware of their logical location in the context of whole system and offer a very low latency and cost for the communication. Fog nodes are frequently placed near to the edge devices, and hence they are able to return reply and other analysis much faster than the cloud nodes. b) Heterogeneity: Fog nodes generally collect different form of data and from different types of devices through different types of networks. c) Adaptive: In many situations, fog computing deals with uncertain load patterns of various requests submitted by the different IoT applications. Adaptive and scaling features of fog computing help it to deal with the above mentioned scenario. d) Real Time Interaction: Unlike cloud computing, which support batch processing, fog computing support real time interaction. The real time data which is time sensitive, is processed and stored at the fog nodes and sent back to the users whenever required. Whereas, the data which is not time sensitive and whose life cycle is long, is sent to the cloud for processing. e) Interoperability: Since, fog computing support real time interaction therefore, it require the co-operation of various providers leads to interoperable property of fog computing . f) Geographically Distributed: Unlike centralized cloud the applications serviced by fog nodes are geographically distributed like delivering seamless quality videos to the moving vehicles.
Further, the processing time of fog nodes is very less (millisecond to subsecond). This technique avoids the need of costly bandwidth and helps the cloud by handling the transient data. In order to facilitate the fog computing, the node should exhibit autonomy (property to take decision independently without the intervention of other nodes), heterogeneity, manageability and programmability. Figure 1.2 shows the architecture of fog computing where IoT devices are connected to the fog nodes and then fog nodes are further connected to the cloud nodes [9].
The architecture of fog computing consists of three layers [10]: a) Terminal layer: This is the lower most layer and consists of the IoT devices such as mobile phones, sensors etc. which detect the information from the environment by sensing it and then transmit the detected information to the upper layer. The information is transmitted in the form of data streams. The IoT data streams are the sequence of values which are emitted by the IoT devices or may be produced by one application module for other application module and send to the higher layer for processing. b) Fog layer: This layer consist of various switches, portals, base stations, specific servers etc. This layer lies between the IoT devices and the cloud and is used for the processing of data near to the IoT devices. If fog nodes are not able to fulfill the request of the terminal layer then the request is forwarded to the cloud layer. Generally, IoT devices do not have processing power and storage, due to which they suffer from many problems such as performance, reliability and security [11]. The fog nodes are capable of performing the operations which require large amount of resources on the behalf of IoT devices which are generally resource constrained devices. This makes end-devices less complex and also reduces the power consumption. Further, fog computing also supports real time interactions between the IoT devices and the fog nodes as the data is available to the IoT devices quickly unlike the cloud computing where batch processing is mostly used. Further, IOT devices are resource constrained and generally do not have security features for which fog nodes act like the proxy servers and provide extra security features. Fog nodes regularly update the software and security credentials and check the safety status of these devices.
Fog computing also offers the implementation of various service models such as Software as a Service (SaaS), Platform as service(PaaS) and Infrastructure as a service(IaaS) [12,13]. Due to such advantages, various frameworks such as Google App Engine, Microsoft Azure and Amazon Web Services which have been using cloud computing, have also started supporting the fog computing for providing solutions to the develop distributed applications which are geographically dispersed and require low latency computational resources. They are also using dedicated nodes with low latency computational power also called as mist nodes(lightweight fog nodes) and are sometimes placed more closer to the IoT devices than the fog nodes [14,15]. Hence, the integration of IoT with fog computing brings many such advantages.

Importance of Caching at the Fog Nodes
The IoT devices do not have to contact to the remote server i.e. cloud every time when they require some data.
The IoT devices first check data in the cache of the fog nodes. If required data is present then the fog nodes return the data to the IoT devices, otherwise they contact the cloud for the required data. Hence, caching of data at the fog nodes reduces the transactional latency. Moreover, fog computing requires lesser bandwidth to transfer the data [16]. As fog computing supports hierarchical processing, the amount of data required to be transferred from the IoT devices to the clouds is less, whereas, the amount of data transferred per unit of time from the fog node to the the IoT devices is more, which leads to the improvement in overall throughput. Hence, caching data at the fog nodes decreases the overall operational expenses. Data is stored in the distributed manner at the fog nodes which can be deployed anywhere according to the requirements. Further, caching of data at the fog nodes helps in the reduction of load at the cloud servers as the data whose frequency of interest is more among IoT devices and the probability of reusing the same data is also high is cached at the fog nodes. Hence, only selected data is transferred for storage and processing to the cloud which reduces the latency of contacting the remote server which is far away from the IoT devices/sensors. Further, storing of data at the fog nodes ensures continuous services to the IoT devices irrespective of irregular network connectivity.
Along with the advantages there are some challenges that need to be addressed in order to cache data at the fog nodes. The biggest challenge of this technique is to decide what to store at the cloud and what to cache at the fog nodes. The decision to cache the data at the fog node should be taken in such a way that the hit rate of data at the fog node should be maximized such that the overall throughput is maximized [17,18]. Further, storage capacity of the fog nodes is limited and they can only store the selected data. Therefore, it is necessary to predict the future demand of the users such that the data frequently required by the users in the future can be stored at the fog node to maximize the hit rate. However, it is difficult to predict the future requirement of the users.
Another challange which need to be addressed is to maintain synchronization between the data cached at the fog node or at different fog nodes and data at the cloud nodes. Further, the security of data at the fog nodes and to select the ideal fog node is also an issue of concern [19]. Further, mobility of nodes or virtual machines which requires for maintenance, balancing and power management is also a challenge which need to be addressed. Each fog node may have one or more virtual machines depending upon requests and traffic conditions. The computation and communication required for the process of hand-off and its effect on caching is very much complicated and expensive [20,21].
As discussed above, this chapter focuses on important aspect of caching, which is to predict the future demands of the IoT users such that effective caching of data can be done at the fog nodes. In order to address this problem, various machine learning techniques are discussed which helps in learning the behavior and pattern of demands of IoT devices and also add auto processing and auto computing capability to the fog nodes. Before exploring the machine learning techniques, in the next section various applications of caching at fog nodes and the life cycle of fog data are discussed.

Applications Of Data Caching at Fog Nodes for IoT Devices
In this section, some of the real scenarios are discussed where data caching at fog nodes can be very useful [22][23][24][25][26][27][28][29]. a) Dynamic Content Delivery and Video Streaming: With the increase in multimedia contents, the conventional network suffer from congestion. Further, video traffic acquire half of the traffic and frames to play the video are required at faster rate such that there is no interruption. Hence caching of data at the fog nodes is suitable approach for faster delivery of the multimedia contents. b) Virtual Reality and Online Gaming: Virtual Reality and online gaming require real time data. In virtual reality it is required to provide the status of the user as well as the location of the users. Hence, it is required to process the data and provide data to the user as soon as possible where fog computing seems to be the promising approach for this purpose. c) Smart Cities: In the smart cities, various IOTs are connected together to share data with each other. These IoTs generate large amount of data which need to be processed near to the IoTs. For example in case of smart traffic lights, data can be stored and processed at fog nodes and used to send warning signals to the approaching vehicles. d) Smart Grids: The data generated by the smart grids contain complex parameters which are hard to analyze.
Fog nodes have the power to analyze, process the complex data to perform heavy computations. Hence, fog nodes can be used in order to store and process the local data generated by the smart grids and various IoT devices used in the smart cities. e) Smart Health-care: Real time data processing make smart health-care more efficient and faster. Hence fog computing can be used in health-care in order to make their working more efficient. For example fog computing may be used to detect falling of the stroke patients. f) Intensive Computation Systems: The systems which require intensive computations require low processing and latency time. Hence the data produced by these systems must be processed and stored at the fog nodes and provided to the systems whenever required. h) Wireless Sensors Systems: The data produced by wireless sensors system such as oil and gas industries, chemical factories is transient which need to be stored near the users. Hence the data produced by these systems should be cached at the fog nodes in order to improve the performance of the systems [30].
In all of aforementioned scenarios it is suitable to store the real time or dynamic content near to the users that are generating the data and also may require it in near future. This requirement can be easily fulfilled by caching the data at the fog nodes located near to the users or IoT devices.

Life cycle of Fog Data
As discussed in the introduction secion, depending upon the various layers in fog computing, fog data goes through various steps from acquiring data at the terminal layer to the processing of data and the execution of tasks to data. The acquired data is either sent to the sink node or directly transferred to the fog node for processing. b) Lightweight Processing: Lightweight processing is done at the fog layer and hence include various tasks such as filtering of data, cleaning of data, eliminating the unwanted data, lightweight manipulation of data, compression/decompression of data, encryption/decryption of data. Some data is stored at this layer in order to support real time processing and rest of the data is transferred to the cloud layer for further processing.
Further, the feedbacks and the data is exchanged by the fog layer as shown in figure 1.3 c) Processing And Analysis: The data received from the fog layer is processed by using different types of analysis in order to extract the important data. The data is permanently stored at the cloud server. According to the processing performed at the data received from the fog layer, reports are generated. Various technologies such as map reduce is used for data processing at the cloud. d) Sending Feedback: On the basis of reports generated during the process of data processing, cloud server send feedback such as data required by the end devices, proper commands to the the device layer in order to perform required action. e) Command Execution: Based on the feedbacks received from the cloud server, the actuators perform the respective action and then required actions are performed on the environment.
It is evident from the above sections that caching played a major role in the fog computing. Efficient caching will be helpful in achieving low latency requirement, and to maintain high QoS and QoE of 5G. Caching is classified as reactive caching where data caching is done on request and proactive caching where pre-fetching of data is done. To achieve higher spectrum efficiency proactive caching is better if prediction errors are nearly zero [37]. Therefore, it is important to design various techniques to predict the future requests of the users, which can be cached at the fog nodes such that repetitive requests to the cloud can be avoided.
In the literature, various techniques have been used for data prediction and caching like fog to fog (F2F) caching [38] where multi-agent cooperation is used. Authors in [39] proposed location customized regression based caching algorithm to predict the future content demands. Authors in [40] distinguished requests on three different popularity levels and then strategically cached data at the fog nodes according to various activity levels.
Apart from caching at the fog nodes, Device to Device (D2D) caching has also been done in the fog computing environment where direct communication between the nodes (IoT devices) takes place at a short distance without any infrastructure [41,42]. Whenever data is required by the device, it checks its local cache for the data. If data is not available, it broadcasts the request to the other devices. The other IoT devices present at the ground tier in hierarchy check for the data. If the data is present then the respective device replies with the data to the requesting device otherwise it replies with the negative acknowledgement. Then requesting device requests for the data at the fog servers. As stated earlier, if data is not available at the fog server then it sends the request to the cloud server.
Cloud server in return sends the data to the fog server and then fog server sends data to the requesting nodes.  As mentioned before, the problem of content placement relies on the prediction accuracy of the user requirement, popularity of content and caching strategy design. To predict the data content demand, large amount of available data related to similar interests, social and geographic data and history data of users can be used for better prediction of user demand [43]. This is effectively implemented using machine learning schemes. In the following section, various machine learning techniques used for data caching at the fog nodes are investigated.

Machine Learning for Data Caching and Replacement
Sr.

No.
Techniques Description 1 Residual Nets [44] In this method in order to reduce the difficulty of training models, shortcut connections are introduced into the convolutional Neural Networks.
Visual inputs are mainly focused in residual nets.

Long Term Recurrent Convulational
Network [45] In this method, convolutinal neural networks are applied in order to extract the features. In video, frame sequences are combined with the long short term memory [46]. Further, the spatial and temporal relationships are exploited between the inputs.

Restricted Boltzman
Machine [47] The performance of human activities are improved by deep Boltzman In these methods,on the basis of hardware and software, the energy consumption of deep neural network is reduced. changes with different contexts, changed locations, network topology, and so on. Therefore, future content request is highly unknown before making any caching decision [52]. Machine learning based algorithms enable each fog node having limited storage to make the right decision in selecting right contents to cache such that the caching performance of the fog node is maximized. Machine learning is used for predicting the user demands and mapping users input to the output actions. Machine learning is a promising approach which is used for improving the network efficiency by predicting user's demand and is used for discovery of early knowledge from large data streams [43]. In machine learning approach, large amount of data is exploited in order to determine the content popularity and also useful for filtering the data and knowledge [53][54][55][56]. The further processing of this data is helful in order to analyze correlation between the features and respective output of the data [57]. Further, machine learning techniques can be categorized into two types: Unsupervised learning and Supervised learning.
In supervised learning, learning systems are provided with learning algorithms with known quantities which help these algorithms for making future judgments. Whereas in unsupervised learning the learning system is provided with unlabeled data, and algorithm is allowed to act upon it without any guidance. Machine learning can be used at any layer of fog computing i.e at terminal layer, at fog layer or at the cloud. At terminal layer, machine learning is used for data sensing. There are various methods used for the sensing of data and are described in  [62].
At fog layer machine learning is used for data storage and resource management [64]. Using machine learning algorithms, data is sampled from the IoT devices, compressed and aggregated at the fog nodes for further processing. Figure 1.5 shows the data analysis methods for the data produced by the IoT devices and various machine learning techniques that can be used in order to analyze the data and then decide that what to cache at fog nodes. 1. Clustering of fog severs: Clustering is a technique used in unsupervised learning in which the information is not guided. In this technique, the fog severs are clustered in order to fulfill the demands of IoT devices [23].
Data is stored at various fog servers after being coded into segments. When the user raises the content request, it is served by the group of fog servers which are clustered on the basis of content stored in them. If the requested content is cached at the fog servers then the IoT device fetches the data from the fog servers in ascending order of transmission distance until the obtained segments are sufficient for decoding. Further, if the obtained segments are not sufficient then the nearest fog server contacts the cloud for the data and fetches the remaining data from the cloud and deliver it to the IoT device. Further, cluster size influences the efficiency of the system as the benefit of cooperative caching is vanished if the size of the cluster is very large but at the same time IoT devices can fetch data from the various nodes hence increases the cache diversity. Therefore, cluster size should be optimal which balances the trade-off.
2. Similarity Learning Approach: In this approach, the fog nodes are given with the pair of similar IoT devices as well as the pair of less similar devices. From the given set of IoT devices, the intelligent fog node finds the similarity function(or the distance metric function) between the pair of similar devices by learning about their various features [43]. In this technique, two parameters i.e. common interest and physical relations (link quality) are considered in order to find the similarity between the IoT devices. One to one matching scheme is also used for the pairing of the IoT devices. With the help of this function, the intelligent fog node finds whether the new device is similar or not and hence find the future interest of new device whose interests are unknown.  The only problem with this technique is that it can not give appropriate results if the relation between the source domain and the target domain is not efficient, which means that if the information demanded by the IoT devices is not related to the present information in the system, then Transfer Learning is not able to take accurate decision for caching.

Recommendation Via Q Learning:
In existing local caching systems, the users do not know about the cached data. Therefore, they are not able to send their request despite of the fact that requested file is available in the cache which decreases the efficiency of the system considerably. Hence in order to improve the efficiency of the system recommender algorithm is used [67]. In this system, the fog server broadcast an abstract to the users so that they gain knowledge about the files which are presently cached. An abstract contains one line introduction about the file and the ranking of the file in terms of the number of requests to the file. The value of the abstract also influences the decision of the user to request the file, since the request rate of a file, arrival and departure rate of the IoT devices is unknown in advance. In order to conquer this problem, Q learning is used which is a form of deep learning approach used to improve the performance of the system by reducing the latency and improving the throughput. It shows very promising accuracy in determining the future demand of the nodes by determining the Q value. Multiple layers are used in this network and these layers are used to process data and predict future demand of the users. Since more data is generated and processed by the lower layers as compared to the higher layers, more layers should be deployed near to the users in order to reduce the network traffic and improve the performance of the system.
The request rate for the i th file depends upon the number of IoT devices which are present in the system and the number of IoT devices which arrived at that particular amount of time. As a result, unknown number of requests to the i th file depend upon the caching action in the previous and present interval. During learning process, the Q value is selected for each state-action pair which maximizes the reward. Then number of remaining IoT devices are counted in order to select the action for current interval. At the end of the interval, the reward for the respective action is calculated and next state is observed. Then using these values a new Q value is calculated [67].
This approach increases the long term reward of the system and hence improves the performance.  [68,69]. Emergence of deep neural network has made it feasible to automatically learn from raw and possibly high-dimensional data. Learning-based caching techniques can be categorized into two approaches, popularity prediction and reinforcement learning approach. In popularity pre-diction approach, first content popularity is predicted, and then according to popularity predictions caching policy is devised. This approach is summarized in figure 1.7 [52].
Various information like traffic patterns and context information are used to predict the content popularity.
Content popularity is predicted in [70], by using users-content correlations and users' social ties through D2D communication. Authors in [71][72][73] have used various online learning algorithms to predict the content popularity.
After the popularity prediction procedure, various caching policies and algorithms can be devised after solving optimization problems by combining estimated popularity with few network constraints or traditional caching algorithms. However these cachong problems are usually complex and are NP-hard.
In the second approach which is Reinforcement Learning Approach (RL), in place of separating popularity prediction and content placement, RL based approach consider both as a single entity and shown in figure 1.8. After the action is taken, the system attains the reward which is fed back to the edge node and the process is repeated. Further, each data item is associated with two data fields: a) time stamp field and b) lifetime field. The time stamp field t gen is used to indicate about the time when data is created or generated, while lifetime field t li f e indicates the time upto which the value is valid in the item. The age of the data t age is predicted by finding the difference between the current time and the time when it is generated. If t age < t li f e , then the requested data is available at the cache and is fresh and then data is directly returned to the user from the cache, otherwise the data available at the cache is not fresh. In that case, when data is not fresh and also if data is not available, then the node fetch fresh data from the cloud and return it to the IoT device. Deep reinforcement learning aims at maximizing the reward when the agent takes an action at the particular state. Figure 1.9 illustrates the application of deep reinforcement learning at fog nodes and to know the future demands of the IoT devices. Figure 1.9: Applying DRL to Fog Caching 6. Federated Learning: Conventional machine learning approaches depend upon the data and processing in a central entity. However, this is not always possible as the private data is not sometimes accessible and also it requires great communication overhead to transmit initial data that is generated by large number of IoT devices to the central machine learning processors [74][75][76][77][78]. Therefore federated learning is the decentralized machine learning approach which keeps the data at the generation point itself and then locally trained models are only transmitted to the central processor. These algorithms also reduce the overall energy consumption and network bandwidth significantly by only transmitting the features rather than the whole data stream. It also respond in real time to reduce the latency. These machine learning algorithms exploit on-device processing power and efficiently use private data as model training is performed in a distributed manner and keep the data in-place i.e. place of generation. In the content popularity approaches discussed above, direct access of the private user data for

Future Research Directions
In this section various research issues related to the fog nodes caching are discussed. These points may help readers for their future research directions in the area of fog nodes caching.
a) Lack of memory space: In order to implement machine learning based system, it is necessary to have sufficient data at the learning system for learning purpose. However, the fog nodes do not have enough memory space hence, it is of profound importance to investigate an effective machine learning technique that can learn from limited available data. As discussed before, reader may explore federated learning which is not exploited much yet for content prediction in caching. b) Heterogeneous IoT Devices: Most of the times, IoT devices are heterogeneous in nature, e.g. in smart homes various types of sensors like light, temperature etc. may be installed which generate lot of different kind of traffic. Till now, the impact of heterogeneity of IoT devices is not well addressed. In this kind of scenario, the network connectivity methods, protocols to handle these devices and the communication methods are not discussed which increases the latency while communicating with the fog nodes.
c) Synchronization among fog nodes: In the present works, the synchronization of data present at the various fog servers and the cloud servers is not discussed. Since the data produced by IoT devices is transient and become useless after sometime hence it is necessary to address the problem of synchronization of data at various fog servers and also with the cloud server. d) Game theoretic/ Auction models: In various business models, the fog nodes earn by serving the IoT Devices. In this kind of systems, fog nodes may not cooperate with each other and may act selfish. Therefore, various game theory based or auction based theories may be applied to solve non-cooperation among fog nodes.

Conclusion
IoT devices generate lot of data which is stored and processed at cloud servers. To reduce the latency, fog computing has been introduced. However, there is a need of caching data at fog nodes to reduce the further communication with the cloud nodes. This chapter introduces various advantages of storing data of IoT devices at the fog nodes and subsequently the challenges faced in order to store data at the fog nodes. This article also describe how various machine learning techniques are used in order to predict the future demand of IoT devices and store most requested data at the fog nodes. The article is then concluded with the future research directions for the readers.