Energy and Latency Efficient Caching in Mobile Edge Networks: Survey, Solutions, and Challenges

Future wireless networks provide research challenges with many fold increase of smart devices and the exponential growth in mobile data traffic. The advent of highly computational and real-time applications cause huge expansion in traffic volume. The emerging need to bring data closer to users and minimizing the traffic off the macrocell base station introduces the use of caches at the edge of the networks. Storing most popular files at the edge of mobile edge networks (MENs) in user terminals (UTs) and small base stations caches is a promising approach to the challenges that face data-rich wireless networks. Caching at the mobile UT allows to obtain requested contents directly from its nearby UTs caches through the device-to-device (D2D) communication. In this survey article, solutions for mobile edge computing and caching challenges in terms of energy and latency are presented. Caching in MENs and comparisons between different caching techniques in MENs are presented. An illustration of the research in cache development for wireless networks that apply intelligent and learning techniques in a specific domain in their design is presented. We summarize the challenges that face the design of caching system in MENs. Finally, some future research directions are discussed for the development of cache placement and cache access and delivery in MENs.


Introduction
In recent years, an exponential increase in traffic load has been noticed in wireless networks due to multimedia streaming applications and services, mobile video streaming, web browsing, and social network inter-connections. Wireless devices are expected to generate much higher traffic than wired network devices in the future [1]. To handle these traffic explosions, mobile wireless networks require continuous evolution and improve the performance in terms of power consumption, data throughput, and utilization of network resources such as backhaul network capacity and bandwidth [2]. Mobile edge networks (MENs) is one of the candidate solutions in future wireless networks. Despite the developments of wireless network architecture, the demand for contents by connected devices and many different applications and services on their mobiles, results in constraints put on latency, energy consumption, and quality of services (QoS).
Considering these problems, researchers investigated the possibility of cach-ing content items locally and proactively at the edge of the mobile networks (i.e., caching in SBS and user terminal (UT)) before users request them. Local caching is a promising approach to improve the network bottleneck [3,4] by providing faster connectivity, lower latency, and less power consumption. SBSs which are also called (sometimes) femto caches, caching helpers, or simply helpers, have normaly high storage capabilities and are used to build a wireless distributed caching infrastructure [5]. Utilizing the advantages of storing contents closer to UTs at the edge of the mobile edge networks allows users to download requested contents from neighbouring SBS or UT caches in SBS for user communication or device-to-device (D2D) communication, respectively, which can boost QoS and reduce the latency while saving power consumption and the network backhaul resources.
Future services and applications are highly bounded by user location, data, and network. Internet architecture with a huge amount of mobile traffic and having mobile users with different moving speed will suffer from poor support for such services. Thus, user mobility patterns should be considered while designing caching in MENs. Due to dynamic updates of user demands, content popularity, and user mobility, it is difficult to decide which contents to cache, where to cache them, and from where to deliver them using traditional decision making techniques. Moreover, the large amount of data needed to develop algorithms for cache system, makes the estimation of cache contents, access, and delivery, a complex and difficult task. In order to meet all these challenges, researchers explore learning and decision techniques for storing, accessing, and delivering the huge amount of data generated within the MENs. A summary of existing survey articles on mobile computing and caching is shown in Table 1. Table 2 lists the acronyms used in this paper.
The remaining of the paper is organized as follows: Sect. 2 discusses mobile edge computing and caching. In Sect. 3, the literature survey of solutions for mobile edge computing and caching challenges in terms of energy and latency are presented. Section 4 explains and compares different caching techniques in mobile edge networks. In Sect. 5, discussion and comparison between different caching techniques in MEN is explored. Section 6 summarizes the challenges that faces the design of caching system in MENs. Section 7 discusses future research direction followed by conclusions and future work in Sect. 8. A comparison of traditional and popularity-based caching An overview of the attributes of videos and the evaluation criteria of caching policies A review of proactive caching, focusing on prediction strategies, challenges, and open research problems in popularity-based caching A comparison of traditional and popularity-based caching An overview of the attributes of videos and the evaluation criteria of caching policies A review of proactive caching, focusing on prediction strategies, challenges, and open research problems in popularity-based caching [7] Caching strategies A survey on caching techniques in macro-cellular networks, heterogeneous networks, device-to-device networks, cloud-radio access, and fog-radio access networks A tutorial on caching techniques and caching algorithms A comparisons among different algorithms in different performance metrics A summary of the main research achievements, challenges, and research directions [8] Achieve low latency communications A survey on the technologies to achieve low latency communications An overview of 5 G cellular network caching and mobile edge computing and other 5 G requirements [9] Deployment, strategies, edge caches, and network performances A survey on edge cache in radio access networks, the deployment location of content placement strategy, and coded caching A summary of the impacts on high spectral efficiency, high energy efficiency, and low latency Challenges in the joint optimization of radio and cache resources [10] Research efforts made on the MENs A review on convergence of computing, caching and communications A survey on cloud technology, software defined network, and network function virtualization A review of the open research challenges and future directions [11] Caching mechanism in information centric networking An overview of the in-network caching mechanisms An illustration of how it works, its benefits and drawbacks A comparison of some typical in-network caching mechanisms through

Mobile Edge Networks
The increasing growth in mobile data traffic and new mobile applications leads to limitations on end users demands and communications at mobile devices. End users require service availability, service reliability, lower latency and efficient energy usage. To overcome limitations such as computation capabilities, storage capacity, latency, and energy consumption, new wireless network paradigm is needed [13]. Mobile edge network (MEN) architecture has been presented as a promising solution for future wireless networks. Proposals for MEN architecture are presented from industry and academia. They are evolved from the mobile cloud computing by utilizing the computing power and data storage away from the mobile devices into the cloud. MEN architectures are summarized in four main models depending on their services and operations. They are mobile cloud computing (MCC), mobile edge computing (MEC), fog computing, and caching [10]. The fundamental concept of MEN is to make network contents, services, and resources closer to the network edge. This can be implemented through the architecture design of MEN that deploys flexible computing and utilizes storage resources at the mobile network edges. MEC is a network architecture concept that was standardized by European Telecommunications Standards Institute (ETSI) and Industry Specification Group (ISG). MEC was acknowledged as a prime emerging technology for 5 G networks [14]. At the edge of the network, IT service environment and cloud-computing capabilities are provided within the Radio Access Network (RAN) [15]. Cisco proposed fog computing as an extension of the cloud computing to wireless network edges. The aim is to accommodate the Internet of Thing (IoT) applications closer to users. At the same time, fog computing nodes are distributed in a wide area and collaborate among multiple end users to provide processing and storage [16]. Researchers at Carnegie Mellon University proposed cloudlet which is a new element that extends the mobile device-cloud architecture. Cloudlet is defined as resource-rich computer or cluster of  [17]. Both Wi-Fi networks and mobile networks are deployed to provide near-real-time provisioning of applications and handoff of virtual machine images among edge nodes when a device moves. Cloudlet can reduce the end-to-end latency between the mobile device and the cloud [18]. Some of key technologies to enable MEN to be flexible and easy to maintain are software defined networks (SDN), network function virtualization (NFV), and information centric network (ICN) [12]. SDN separates the control plane from the data plane by allowing logical centralization of control and enabling direct programming of wireless network controls with improved energy efficiency. ICN is used to speedup content distribution and utilize network resources [10]. ICN serves requests from closer content nodes along the path which enables content caching in both the air and the mobile devices. NFV enables flexible design and management of network functions independent of the underlying physical network equipment [19]. Integrating the programming control principle in SDN with information centrality in ICN leads to dynamic networking, caching, and computing resources to meet the requirements of different applications [20,21]. Also, NFV-based caching solutions offers caching of personalized and secure contents isolated from other content providers and from other participants [12]. By utilizing these technologies, functions, contents and resources are moved closer to end users. This enables the MEN that exploits a large number of low-cost storage devices at different places in network edges, to proactively cache popular contents during off-peak periods. Caching can be deployed at different levels in mobile networks instead of fetching them from the core network [10]. Reducing the number of network hops between the location of the contents and the user requesting the contents will reduce the latency for retrieving the contents [22]. MENs bridge the gap between the capability limitations of storage and computation in user terminals and their increasing demands. It is done by placing storage and computation resources at the edge of the network closer to user terminals. MEN can reduce latency and energy consumption. There are various techniques in the literature that are proposed to process data locally at the edge of the network and accelerate data streams, which will reduce the traffic bottleneck toward the coding network [23]. The caching locations which are considered as caching levels in MEN architectures are macro base station (MBS), small base station (SBS), and user devices allowing for device to device (D2D) communications. The places that can be used to cache most popular contents within MENs are shown in Fig. 1 and described below: 1. UT caching Exploiting the storage resources in UTs is one of the key technology in 5 G networks [10]. Caching in user devices allows the improvements of caching strategies to allow D2D communications. 2. SBS caching Each cell in MEN employs a large number of SBSs. SBS includes higher storage capacity than UT cache capacity. They are closer to the end users and usually provide higher data rates [10]. Therefore, utilizing caching in SBSs is a promising solution to improve QoS in next-generation heterogeneous networks. 3. MBS caching MBS covers a larger area within the heterogeneous network and can serve more users. The storage capacity in MBS is higher than other caches within the cell which will lead to a higher probability of finding the requested file (hit).

Energy and Latency Efficiency in Mobile Edge Networks
This section presents the benefits of MEN as an emerging technology for the future wireless networks. There are a number of research work that has been done to show the efficiency of MEN in terms of energy consumption, latency requirements, and storage capacity for different applications and services. Tables 3 and 4 present an overview of the main research work areas in the literature that proposed possible solutions to future mobile edge computing and caching network in terms of energy and latency, respectively. The research areas can be categorized into the following main streams: 1. Computation offloading: The advances in computing technology and various applications that require high computation power and resources to run complex programs have increased lately. These applications use wireless networks and run on mobile devices with limited capabilities to support the needed resources [24]. One of the solutions to solve this problem is computation offloading. In computation offloading, the mobile devices transfer tasks to an external edge cloud and receive the results from the edge cloud. Offloading increases mobile terminals capabilities by migrating the computation to more resourceful computers (servers) at the edge of the network [25]. 2. Task caching: Computation offloading considers computing capabilities at the edge of the network by assuming enough hardware and software resources to execute the tasks. However, enough storage capacity for computation offloading is another important challenge that faces future wireless networks. Task Caching was proposed in [26], which means storing the task (application computation task) and its related data at the edge of the network. 3. Content caching: Contents requested by end users in massive multimedia services over mobile network face network capacity limitations and increases backhaul links load. Requesting the same contents by different users also causes network congestion and a waste of network resources.The development of mobile edge caching techniques is another promising solution for wireless networks. Content caching of the most popular files can prevent duplicate transmission of the same contents and improve end-user QoS. Quality of service can be improved, since downloading the contents from network edge (for example, base stations or end user terminals) reduces latency compared to downloading the contents from Internet contents providers (core network) [10]. 4. Resource allocation: With multiple user terminals, MEN servers have much less computational resources. One of the main issues in the design of MEN is to consider resource allocation. It is the process of allocating the finite radio and computational resources to multiple mobiles under resource constraints. There are two categories of resource allocation schemes for MEN: centralized and distributed. In the centralized resource allocation, the MEN server is responsible for all mobile information, makes the resource allocation decisions, and sends the decisions to mobile devices. While in distributed  Propose a spatial and temporal computation offloading decision algorithm in edge cloud-enabled heterogeneous networks [32] Minority games theory Develop distributed server activation mechanism [33] Multi-label classification and deep learning Develop a dynamic offloading framework for mobile users [34] An approximation collaborative computation offloading Present centralized cloud and MEC over FiWi networks and the cloud-MEC collaborative computation offloading model [35] Karush Kuhn Tucker Lagrangian multiplier method Design an energy-efficient autonomic offloading scheme for physical layer design and application latency [36] Dynamic sequential game theory Propose an adaptive sequential offloading game approach Task caching [26] Mixed integer nonlinear programming Caching of complete task application and their related data and design a multi-user computation offloading algorithm in edge cloud Content caching [37] Poisson point processes modelling of BS power and locations Analyse energy consumption of cacheenabled wireless network using spatial model based on stochastic geometry [38] Lagrange multiplier and duality Exploit statistical information on individual popularity preferences in caching policies [39] Social-tie factor Modelling Proposed social-aware cache information processing for future ultra-dense networks [40] Dual decomposition and a sub gradient algorithm Propose joint design of the transmission and caching policies and formulate problem that minimize a generic cost function Resource Allocation [41] Mixed discrete-continuous optimization Propose a joint caching and offloading mechanism that optimally allocate the storage resource at the BS for caching and the uploading and downloading time durations [42] Dual-decomposition method and alternating direction method of multipliers Propose optimization problem to jointly consider bandwidth provisioning and content source selection Multicast allocation [43] Graph theory Propose multicast caching in dense small cell networks when a group of users can benefit from multicast caching at a lower energy cost [44] Distributed potential game model and cloud and wireless resource allocation algorithm Propose a distributed joint computation offloading and optimization scheme in heterogeneous networks 1 3 resource allocation, many techniques including game theory and decomposition techniques are used to develop a distributed algorithm [27]. 5. Multicast caching: To reduce the load of wireless links in traditional unicast connection-based transmission and avoid transmitting the same file multiple times to multiple users, a multicast caching is proposed for base stations in 5 G mobile networks [28]. In multicast caching, the popular content is brought close to the users. The optimization objectives are to minimize the average latency for all content requests and minimize the average energy consumption [8]. 6. Service chain management: Service chaining policy refers to the term that describes executing multiple service functions in an ordered list to guarantee performance and security requirements [29]. In MEN networks, light weight data centers can be used at the edge of the network. In these centers, operators deploy service chaining as to steer traffic through the management of a set of service functions. Service chaining is realized by using the software-defined network and network function virtualization technologies. Service chain management can reduce network latency by offloading the workload or bandwidth from the core network service [30].
Mobile edge computing and caching are considered as a promising solution that supports many emerging applications and services with specific constraints of latency, energy, and reliability [31]. In the following section, comparison of different caching techniques in mobile edge network are presented.

Caching in Mobile Edge Network
Most research work in the literature cache either uncoded contents or uncoded parts of files during the placement phase. The base station broadcasts the coded files (linear combination of multiple files) to user terminals during the content delivery phase. Then, the users can decode their files simultaneously [54]. During the delivery phase, the cache memory contents of the requested user are updated to store new files. There are different algorithms and techniques to implement cache replacement. Researchers have proposed caching algorithms for wireless systems that range from simple algorithms to more advanced intelligent techniques. These algorithms are divided into two main streams. The first is cache replacement algorithms based on prior knowledge about contents popularity, while the second is cache replacement without prior knowledge about contents popularity [55]. Table 5 shows a comparison of different caching techniques in literature in terms of their dependency on content popularity, online learning, training phase, context-aware, socially aware, mobility aware, and prediction ability. Propose a user attribute aware video distribution mechanism using scalable video coding Table 4 Overview of techniques based latency efficient in mobile edge computing and caching Area References Approach

Summary
Computation offloading [46] Heuristic search, reformulation linearization and semi-definite relaxation Formulate an optimization problem to jointly minimize the latency and offloading failure probability [47] Lyapunov stochastic optimization Propose a dynamic policy for task offloading and resource allocation [48] Lyapunov optimization Investigate a green MEC system with energy harvesting devices and propose computation offloading strategy [36] Dynamic sequential game theory Propose an adaptive sequential offloading game approach and design a multi-user computation offloading algorithm [49] Markov decision process Develop a post-decision state based learning algorithm that learns the optimal joint offloading and auto scaling policy on-the-fly Service chain [30] Hash-based group management Propose a hash-base group table to reduce the computation time for assigning user into groups to reduce the control plane latency Content caching [50] Transfer learning algorithm Propose a proactive content caching optimization model [51] Assessment tests of caching solution Propose a prototype implementation of a mobile edge cache Resource allocation [31] Submodular optimization Propose resource cognitive intelligence based on learning of network contexts and design an optimal caching strategy [52] Auction theory Propose a decoupled resource allocation model that manages the allocation of computing resources distributed at the edges independent of the service provisioning management performed at the service provider end [53] Nash bargaining game Propose resource optimization problem based users fairness and the global throughput The least recently used (LRU) and least frequently used (LFU) algorithms are used in [56] and [57], respectively. These techniques are simple cache replacement algorithms that do not consider future content popularity and update the cache continuously during the delivery phase. In the LRU algorithm, the cache includes an ordered list which is updated to follow the recent access of all cached contents. When the cache is full, the new content is placed in the least recently accessed cache content. The content of the cache can be changed based on prior knowledge of content popularity. Popularity statistics of different video files modelled using a Zipf distribution were used in [58] and [59]. The cache replacement algorithm by tracking variations in the popularity distribution and updating cache content at user terminals and collaborative deviceto-device communication are combined to increase the efficiency of content delivery. There is a trade-off between having an optimal content replacement that predicts future requests efficiently and the speed of computing the content popularity. Also, in these methods, there is no personalization to user context and preferences.
Authors exploited the storing of video files closer to users in femto caches [60] and [61]. They formulate the problem with the aim of increasing the throughput by unloading a lot of traffic from the main cellular network. The work presented in [2] proposes caching and multicasting techniques. Caching aims to allow popular content files at network edges in order to shorten the distance between content and requesters. Multicasting aims to serve identical requests happening at nearby locations through common multicast streaming by sending multiple copies of the same content to different users. [3] proposes the use of proactive caching of contents based on file popularity and correlations between users and file patterns. Files can be proactively cached during offpeak demands by using a machine learning algorithm and collaborative filtering with context-awareness. The procedure aims to predict the set of influential users and social structure and to proactively cache strategic content on those user terminals to utilise device-to-device communications. This approach requires a training set of known content popularities and can be learned during a training phase to decide which content to place in the cache.
A reinforcement learning-based coded caching framework (b62) is used to model the cache strategy in a heterogeneous small cell network [62]. Authors have designed an optimal cache placement policy that uses the learned file popularity to find the optimal cache contents. The cache placement policy takes into account the users' connectivity to the SBS. At regular intervals, the cache prefetches segments of frequently requested (coded) files in order to fulfil user requests. Caching algorithm is presented based on contextual multi-armed bandits optimization [63]. In this algorithm, the SBS updates its contents regularly by observing the demand of cached files and learns the contexts of popularity profile over time. The objective of the multi-armed bandit optimization is to maximize the number of cache hits. While a different extension of the multi-armed bandit framework is proposed, [64]. In this framework, the authors have exploited the topology of user connections by incorporating coded cache contents. Based on observations of instantaneous demands that assume content popularity distribution, an optimal cache content placement strategy can be achieved. While previous algorithms do not consider future prediction of popular contents in the design of cache replacement algorithm, the work in [65] and [66] aim to learn popularity trends. Their works include the design of context-aware proactive caching. There is no prior knowledge about content popularity in [65], while in [66] the cache replacement method learns the popularity of contents and uses it to determine which contents to place in the cache and which contents to evict from the cache.

Comparison of Different Caching Techniques in MEN
There are different studies that formulate the caching problem at the edge of the network. These proposals examine the problem from different perspectives. In each study, the optimization problem is formulated based on input attributes that are manipulated by the optimization algorithm and the scheme of caching used in the model. The performance indices in these proposals are overall delay, user satisfaction ratio, offloading probability, and total throughput. They have one general common objective which is redirecting user requests from the expensive and limited backhaul links to local cache storages at the edge of MEN networks. Table 6 illustrates a comparison between different caching techniques in MEN networks.
The association between UTs and SBS in small cell networks was studied in [67]. Based on file availability in SBS and the backhaul congestion state, the SBSs decide which users they should serve. The problem is formulated using one-to-many matching game theory. In [68], two proactive caching scenarios are examined. The goal in both cases is to keep the user satisfaction ratio above the required limit. In the first case, the contents are cached proactively at the SBSs during low-peak demand. The cache procedure is built on a supervised machine learning algorithm using singular value decomposition (SVD). This technique consists of two parts. The training of the input matrix that represents the users'-to-files rating association and predicts, followed by conclusions and future work in predicting what files each user will request (file popularity matrix).
In the second case, the contents are cached proactively in users' devices. The centrality metric is used to measure the social influence of a node and its connection with other nodes (social community). Then, the k-means clustering method is used to form a set of influential users within a community (users' social ties). Proactive caching procedure considers the number of times each file is downloaded by each user to form a user-to-files history matrix. The beta distribution is assumed to denote the probability that a content is selected by a given user. The popular contents that will be cached in influential users' devices are selected based on Chinese restaurant process (CRP). By caching at UT and enabling D2D communication, the load on SBS and backhaul loads are reduced [68].
Optimal two one-tier caching placement based on the difference in convex programming is presented [69]. The objective of the optimization problem is to maximize the offloading probability. The offloading happened in three cases: 1. Self offloading when the requested contents are found on UT (local cache), 2. D2D-offloading when the requested contents are found from near devices, and 3. SBS-offloading when the requested contents are found in near SBS.
Their results show that popular contents must be cached under relatively low node density while other contents must be cached evenly under relatively high node density.
In [70], the author formulates the caching problem as a video recommendation system. They clustered files and users depending on video file preferences and formulate the cache scheme in two phases: the D2D cooperative phase where files are stored in UTs and caching phase where files are stored in SBS. The optimal caching is designed based on the greedy intra-cluster algorithm to obtain minimum total average file download time. The results show that clustering files before applying the optimal caching algorithm can reduce the computational complexity of the huge number of involved users and files. The work in [71] proposed joint caching, routing, and channel assignment for video files in collaborative small cell cellular network. Their objective is to maximize network throughput by using conflict graph to characterize the communication link interference. The optimization problem is modeled as a large-scale linear programming problem that is solved using column generation method. The algorithm selects a subset of variables that have potential improvements to the objective function in order to minimize the complexity of the optimization problem. The optimization problem is then divided into two sub-problems: restricted master problem (RMP) and pricing problem (PP). Their results show that the overall throughput of the video data that can be delivered to users, is considerably increased over the state-of-the-art femto caching models.
In [72], proactive caching is designed based on traditional collaborative filtering by regularized decomposition to estimate the popularity matrix. Then transfer learning is used to improve the estimation accuracy by transferring and learning hidden knowledge from other domain such as social networks. Finally, an optimal caching strategy is implemented as a distributed iterative algorithm to update the cache. Results show that user satisfaction ratio increases with the number of SBS compared to other caching approaches. Authors in [73] investigate proactive caching for service providers to reduce redundant backhaul transmission to edge nodes (ENs). The Stackelberg game is used to formulate the problem and it was decomposed into two sub-games, a storage allocation game and a number of user allocation games. The service provider is modeled as a leader that decides the prices for the storage and backhaul resources on ENs. ENs are modeled as followers. Their results show that lower total backhaul resources can be achieved with proactive caching based game theory compared with centralized popularity based caching and random caching. The authors discussed the complexity and scalability of edge caching in wireless communication networks where there will be a large number of ENs, users, and service demands and will involve a huge amount of data. The complexity is defined as the number of iteration steps of the caching algorithm and the amount of information exchanged between network edges. The performance of the caching algorithm with the increase of network size is addressed with the scalability of the caching algorithm. The popularity estimation used in caching techniques for video files in the work presented previously is based on user request probability or the number of views of videos. The work in [74] computes video popularity for published and unpublished videos using intelligence based content-awareness. The prediction of video popularity enhances cache placement decision as well as the quality of service in cellular networks.
An adaptive caching scheme is proposed that takes into account user behaviour [76], content popularity, request statistics from users, and operating characteristics of the cellular network. The network operating characteristics include network topology, link capacity, routing strategy, cache size, and energy usage to read and write files from hardware storage (which is called cache deployment cost). Based on content features, the extreme learning machine (ELM) can predict how popular a piece of content will be.
The features of the content are computed using a combination of human perception models and network parameters. The adaptive caching scheme uses a mixed-integer linear program (MILP) to cache the results of popularity estimators. The caching decision is made while the network is not heavily utilized. Without affecting network quality of service, popular content can be transferred between BSs. The impact of mobility awareness in cache placement algorithm is discussed in [77]. The authors formulate the problem of caching coded segments at BSs and UTs taking into account users mobility and the content amount per transmission. The optimization problem is formulated as an integer programming problem that can be solved by submodular optimization.

Challenges in Designing Caching Techniques
As illustrated in Fig. 1, mobile edge network (MEN) consists of multiple small base stations (SBSs) and user terminals (UTs). Each macrocell consists of one MBS connected to a gateway of the core network via a high-speed interface, N SBSs which are connected to the MBS through backhaul links, and M UTs connected to neighboring devices through D2D communication and to SBSs. The sets of SBSs and UTs are denoted by s={s 1 , s 2 , … , s N } and u={u 1 , u 2 , … , u M } , respectively. There are cache storage in each MBS, SBS, and UT with different storage capacities. Within one macro cell, cache storage capacities can be defined by two sets c s ={c s 1 , c s 2 , … , c s N } and c u = {c u 1 , c u 2 , … , c u N } for SBSs and UTs caches, respectively. Assume that MBS has enough cache capacity to store F files defined by the set f={f 1 , f 2 , … , f F } . Following the study of different caching techniques in literature (See Fig. 2), we can summarize the following challenges:

Content Popularity Modelling
In order to improve the performance of caching strategies, it is required to incorporate content popularity in caching decision making [66]. The content library consists of F files and stored at the MBS cache. Each file is f z for z = 1, … , F . The size of file f z is denoted as f l z . The files are requested from the main library based on their popularity distribution. The popularity of the F files are denoted by the set p, where p={p 1 , p 2 , … , p F } . The set p can be characterized by Zipf popularity distribution. If the files are arranged from the highest popular file to the lowest popular file, the popularity of the i− th ranked content can be shown by Eq. (1) [78]: The distribution for file f i is characterized by the exponent factor , also called the skewness of the popularity. When = 0 , the popularity is uniform over contents. As grows, the popularity becomes more skewed. Table 7 shows the methods used to model content popularity in literature. In many work, the popularity of files are generally modelled using Zipf distribution of all files. Zipf distribution gives a fixed popularity profile and it is assumed that content popularity is known in advance. Based on Zipf distribution, a small portion of Internet contents is highly popular while the rest is rarely requested [74]. In reality, content popularity needs to be estimated depending on number of related factors and not only on the content popularity distribution. Examples of these factors are files' preferences, users' preferences, users' context, social network characteristics, users' previous requests, etc. Also, content popularity must not be fixed and it is expected to change continuously with time, date, and location. The approach in [79], assumes the popularity of video contents changing slowly and the popularity distribution of all files can be considered as fixed and known prior to the cache placement algorithm. They defined the popularity distribution of video files depending on the number of views vs the rank of videos in terms of views. In [63], context-dependent popularity profiles are learned online while observing connected users' demands and their context information. The placement algorithm does not depend on prior knowledge of content popularity, but it models connected users' context-dependent demands of files following Zipf distribution. The context information used in modeling the content popularity is the maximum number of users that can be served by SBS, the fraction of female users, and the fraction of underage users. The total number of files that are used in Zipf distribution formula is divided according to the context information of connected users.
In [74], the authors proposed a popularity prediction model for video files. The popularity of video contents is estimated from both published (statistical information) and unpublished video (newly uploaded videos). The process consists of the following stages, (1) Feature extraction from unpublished videos based on deep neural network technique, (2) Clustering the features resulting from stage 1 based on collaborative filtering technique, and (3) Fitting a regression model to predict the popularity of unpublished videos while using the statistical information of the published videos as a training set to the regression model. The approach in [72] used Zipf distribution as the training set to design a learning based approach. Their model estimates content popularity using regularized decomposition based collaborative filtering and they improve estimation accuracy using transfer learning technique. (1)

User Mobility
Modeling user mobility depends on spatial and temporal properties. The spatial properties provide physical location information of user mobility patterns, while the temporal properties provide time-related information [80]. User mobility can be modeled by assuming a pairwise contact process that follows an independent Poisson process. The work in [81] implies that the occurrence time of pairwise contact event can be predicted in large time scale. The Poisson process is used for counting the occurrence of contacts between UTs, and between UTs and SBSs occurring at a certain rate.
To establish successful communication between mobile UT u i and SBS s j , u i must be within the communication radius of SBS s j . For independent Poisson process, the pairwise contact duration T SBS i,j between mobile UT u i and SBS s j follows the exponential distribution with parameter SBS i,j . Here, SBS i,j represents the contact rate between mobile UT u i and SBS s j . The contact duration T SBS i,j when mobile UT u i is within the communication range of SBS s j can be defined as follows [77]: where t 0 represents the most recent time when mobile UT u i enters the communication range d SBS of SBS s j . The locations of SBS s j and mobile UT u i at time t are represented by l t j and l t i , respectively. Similarly, to establish successful communication between mobile UT u i and mobile UT u k , the shortest distance between the two devices must be within the communication range d D2D . The contact rate between mobile UT u i and mobile UT u k is represnted by UT i,k . The contact duration T UT i,k when mobile UT u i and mobile UT u k are within the communication range of d D2D can be defined as follows [77]:

Fig. 2 Different caching techniques objectives in MEN
where t 0 represents the most recent time when mobile UT u i enters the communication range d D2D of mobile UT u k , and l t i and l t k represent the locations of UT u i and UT u k at time t, respectively. In most of the work discussed in previous sections, it is assumed that users remain stationary while requesting and obtaining files. Research with this assumption does not include mobility as an effective parameter while taking cache placement decisions. A user may be served by any SBS located in the user communication range. Considering mobility on the caching design in future wireless networks caching, can be classified into three categories based on cache location: 1. Cache in SBS: In these research [82][83][84], and [85], file caching in SBS while user mobility is considered. 2. Cache in mobile UT: In these research, [80,[86][87][88][89], and [90], user mobility-aware caching design are proposed by utilizing D2D communication links. 3. Cache in SBS and mobile UT: In [80,84,85], and [91], the researchers proposed user mobility-aware cache placement in SBS and mobile UT.
Caching efficiency can be improved by exploiting user mobility aware cache placement in SBS and UT [80]. However, few cache placement techniques have taken the impact of user mobility [77]. Most existing approaches which estimate cache contents proactively face the redundancy problem. This has happened in some caching strategies that have neighboring SBSs storing the same popular contents. Redundancy results in wastage of cache resources and minimizes the cache storage capacity that is available for users. The interactions between network edges should be taken into account while optimizing the caching strategy [92].

Power Constraint
Energy consumption becomes a more challenging problem in the design of wireless communications due to increase in energy consumption cost, number of broadband wireless network users, and growing demand of the contents in the future networks [93]. Delivering contents from SBS to UTs and from UT to another UT will consume power and drain energy at both the network and UT. Cache system should be designed with an objective to  [2,3,56,57,60,61], [64,67,71,73], and [77] Zipf distribution [79] Number of views vs rank of videos in terms of views [74] Feature extraction and popularity prediction for unpublished videos [62,72], and [76] Popularity estimation based on learning methods find optimal transmit power and sustain the continuous growth of power consumption [94]. There are two power consumption models presented and discussed in [77], as follows: 1. Energy consumption for D2D caching It is assumed that the interference of D2D communication is not considered. When UT u k transmits the cached contents to UT u i , the components of power consumption of UT u k are given as follows: • UT is the inverse of power amplifier efficiency factor of mobile UT u k , • P u k T is the mobile device transmission power of mobile UT u k , • P u k C is the circuit power consumption of mobile UT u k , • P u k H is the energy consumption of caching hardware devices of mobile UT u k . Then, power consumption of UT u k , ∀k is given by: Neglecting the energy consumed for delivering the cache contents, the power consumption can be written as: When UT u k transmits file f z of length f l z to UT u i , the energy consumption can be computed as [77]: where the data rate R i,k of D2D communication between UT u i and UT u k can be calculated as follows: and • W UT i,k is the channel bandwidth from UT u k to UT u i , • d i,k is the distance between u i and u k , UT is the average noise power for D2D communication, • UT is the path loss factor.

Energy consumption for SBS caching
Similarly, to compute the energy consumption to transfer file f z from SBS s j to UT u i , it is assumed that there is no interference between SBSs. The downlink speed R i,j is given below: where • W SBS i,j is the downlink transmission bandwidth from SBS s j to UT u i , ∀j ∈ {1, … , N}, 1 3 • P s j T is the SBS transmission power, • d i,j is the distance between u i and s j , • 2 SBS is the average noise power in communication with SBS, • SBS is the path loss factor.
Then, the components of power consumption of SBS s j are given as follows: • SBS is the inverse of power amplifier efficiency factor, • P s j C is the offset of site power. When SBS s j transmits file f z of length f l z to UT u i , the energy consumption can be computed as [77]: Formulating cache system requires involving the trade off between minimizing energy consumption by caching contents at the edge of the network closer to user terminals and maximizing the probability of content popularity to place contents that will be requested by users in the near future. Caching contents requires energy to deliver the contents from MBS to SBS and UT caches. If these contents are not requested by users, and users request other contents which will be delivered again from MBS, then the energy consumed on filling SBS and UT caches was lost. The challenging problem is the adapting of a caching system to reduce power transmission by caching contents that has high probability of popularity. Table 8 illustrates power consumption-aware caching algorithms proposed for wireless networks.

Quality of Service (QoS)
The quality of service (QoS) is a network performance characteristics that is experienced by the end user. Two critical metrics can be used to refer to the QoS in MENs, they are: latency and throughput. These constraints need to be taken into account while formulating the optimization problem of caching at the edge of MEN. Table 9 and Table 10 summarize previous work on latency and throughput computation in caching scheme for wireless network.
1. Latency: In caching systems, latency refers to the average content delivery delay experienced by the end users [101]. According to cache types, latency can be classified into three types: (a) Average latency of delivering the requested content from another nearby UT cache through D2D communication. The latency is also refereed to as: delay, download time, and content delivery deadline. In future wireless networks, new services and applications will appear, such as augmented reality (AR) and virtual reality (VR) that have tight latency requirements

3
than typical video streaming. Caching at the edge of the network promises to reduce latency required for requested data access and delivery. Table 11 illustrates the target requirements for different services and applications [102][103][104], and [105]. The reliability can be defined as the probability of successful transmission of a certain amount of data from one peer to another peer within a given deadline or time frame [106]. Additionally, storage indicates if the target service requires storage capacity for its manipulated data and the mobility indicates if the service needs processing of user terminals locations. Based on the requirements given in Table 11, latency is highly critical in most of these applications and services. 2. Throughput: In caching systems, the throughput refers to the data units that can be delivered through the network per unit time interval [101]. In MEN, this metric is used as a joint indicator of network transmission capabilities. Authors in [114] discuss throughput capability in decentralized coded and uncoded caching in a multihop D2D communication for next generation cellular networks. They illustrate the effect of using UT cache placement strategy on the increase of throughput capabilities.

Caching for Emerging Applications and Networks
Recently, new applications and services such as AR/VR, IoT, traffic monitoring, and big data processing with their requirements discussed in Sect. 6.4 have been emerged. In addition to their target requirements, these applications includes various types of sensors, are launched to be used by different types of mobile devices, and produce a wide variety of data. Therefore, MEN has been introduced with the cloud computing capabilities, IT service environment, and caching at the edge of the network to transfer the data processing and caching to the edge of the network. However, there are some points that need to be considered in designing energy and latency efficient caching in MEN to overcome the problems that face emerging applications and networks: 1. Offloading tasks from mobile device with limited capabilities to the nearest mobile edge server may face delay due to congestion in communication environment in mobile edge server. In this case, task requirements (in terms of latency and reliability) will not be met. Therefore, it is important to select mobile edge server that provide communication, processing time, and storage capacity not necessarily the one with the shortest distance [117]. 2. Many of these emerging applications are intelligent applications such as personalized shopping recommendation, video surveillance, intelligent personal assistant, and smart applications. Artificial intelligence (AI) applications require big data analysis. Mobile devices running these applications may suffer from limitation in device capabilities to perform high computation, poor performance, efficient energy, and limited data storage. The merge of MEN and AI is required such that MEN collaborate between edge devices and SBSs to serve users requests and AI simulate intelligent human behaviour in mobile devices by learning from previous data [118]. 3. In smart industry, unmanned aerial vehicles (UAV) have been deployed to assist MEN infrastructure. UAV is a mobile device that can host SBSs storage and edge computing and has the advantage of being equipped with cameras, sensors, and devices for com-

Learning and Decision Technique for Optimal Caching Design
In next-generation 5 G wireless networks and beyond, ultra-dense heterogeneous networks which are highly dynamic and complex, will add many challenges for network design and management. The wireless network will face huge data consumption from connected users and machines, that adds more complexity and challenges. The design of MENs that includes distribution of computational resources and storage devices in the form of local caches enables the utilization of decision theory, complex machine learning (ML), and AI approaches to providing possible solutions for the growing challenges. Developing an optimal caching system with frequent changes of input parameters (users' mobility, file requests probability, and contents popularity) with an objective to maximize network throughput, minimize power consumption, and minimize content download time, is a highly computational complexity problem. A learning and decision technique based approach allows combining reasoning, learning, prediction, and decision making algorithms to efficiently find solutions for optimal cache design. In the literature, there are number of research work in cache developments for future wireless networks that applied learning and/or decision approaches in a specific domain in their design. Table 12 illustrates a summary of these research and the solution they provided. A brief review of some of future research directions for the development of cache systems are discussed in the following.  [95] Proposed content caching for smart grid enabled wireless multimedia transmission system with optimal power allocation to users [77] Proposed an optimal transmit power of SBSs and UT in order to reduce the delivery energy cost [96] Developed a framework to minimize the total network power consumption by a joint design of adaptive BS selection, backhaul content assignment and multicast beam forming [97] Proposed optimal allocation cooperative caching scheme for industrial internet of things (IIoT) in 5 G heterogeneous energy consumption [98] Formulate the optimal caching placement at the wireless the energy efficiency of heterogeneous edge that maximize wireless networks [99] Design a green content caching and mobile user-base station association mechanism in the SCNs [100] Propose two energy-efficient caching in heterogeneous networks: scalable video coding (SVC) based fractional caching and SVC-based random caching

Decision Theory
When the problem requires to select one action from several possibilities, we will require to formulate a decision-making problem. In statistical theory, the branch that deals with such problems are called statistical decision theory or hypothesis testing [133]. In possibility theory, there are a variety of information fusion operators. Mainly they can be classified into three groups [134] as follows: 1. Conjunctive operators: Can be used for merging agreeing sources and they search for values when all the sources are agreeing. 2. Disjunctive operators: Can be used for merging conflicting sources. 3. Trade-off operators: Can be used for partially in conflict sources. Proposed latency-centric placement and delivery strategies for cloud and cache aided wireless networks [88] Propose cooperative vehicle-aided content edge caching scheme to minimize the content delivery latency [108] Proposed hybrid content caching algorithms for joint content caching control in BSs and cloud units (CUs) subject to finite service latency [109] Proposed a joint caching and association strategy to minimize the average requested content download delay [110] Proposed an optimal cooperative content caching and delivery policy aiming to minimize the average downloading latency [111] Proposed two content caching policies: caching popular files and greedy caching in BS and D2D with the aim to minimize transmission delay [112] Proposed probabilistic caching placement-aided throughput in stochastic wireless D2D caching to measure the density of successfully served requests by local caches [113] Proposed deterministic caching algorithm and enable D2D connections based on reinforcement learning to minimizing the download latency Table 10 Throughput computation in caching schemes References Contribution [115] Proposed femtocaching and D2D collaboration to improve video throughput [111] Proposed two content caching policies: caching popular files and greedy caching in BS and D2D and investigate the behaviour of the average throughput per request [116] Proposed optimal file placement for deterministic and random caching with the aim to increase throughput for high user density wireless video network [113] Proposed deterministic caching algorithm based on reinforcement learning to maximize system throughput

Evolutionary Approaches
Evolutionary approaches (soft-computing) can be used to solve NP-hard problems that requires hard computations. The model resulted from evolutionary approach is able to manipulate uncertainty and incomplete datasets. Two types of evolutionary approaches are used in the design of caching systems: 1. Genetic Algorithm: The most popular evolutionary strategy that can be used to solve multi objective optimization problems. The model is designed by deriving from previous generations. The individuals are allowed to reproduce and cross among themselves with a bias allocated to the fittest members. New generations result from the combinations of the most favourable characteristics of the mating members of the population. New generation is better to fit than previous generations. The control parameters of genetic algorithm are: the number of individuals in the populations, crossover probability, mutation probability, and number of generations [135]. The authors in [131] proposed a hierarchical collaborative caching strategy focusing on content placement for 5 G networks. The objective of the cache placement optimization problem is to maximize the saving in total latency of the system. The optimization problem is formulated as two sub-problems that are proved to be NP-hard. To solve the computational complexity of the problem, they used genetic placement algorithm to find approximate optimal solution. 2. Fuzzy Logic: Fuzzy logic system is used to find solutions for problems with uncertainty under membership degrees perspective. Fuzzy systems allow to represent set membership as a possible distribution. Since fuzzy theory depends on degree of membership rather than probability (likelihood), this makes fuzzy logic more effective in building fuzzy conditional inference to model uncertain information. In [132], we have proposed an algorithm for proactive caching based on fuzzy soft set (FSS) approach for decision making on file caching. The algorithm decides which files to cache and where to cache them depending on file popularity distribution, file to user preferences, file clustering,

Machine Learning (ML)
ML techniques model the functional relationship between input datasets and output actions with the aim to optimize some parameters. The resulted model is able to estimate an output as close as possible to the actual value. ML techniques can be categorized in two main groups: supervised and unsupervised learning depending on whether the data is labelled or not. In supervised learning, the aim is to model input and output datasets (labelled data) while unsupervised learning aims to model the hidden structure from unlabelled data sets. In caching systems, ML can be utilized to explore and extract knowledge from connected users and network characteristics to build an intelligent decision making system to make decisions for cache placement, cache access, and cache delivery options.
Some ML techniques have been applied in caching system such as: Reinforcement Learning (RL) and Deep Learning.
1. Reinforcement Learning: In reinforcement learning, the machine interacts with its dynamic environment through trial and error interactions. As a result of the interactions, the agent learns actions by receiving input of the current state of the environment and chooses the next action based on possible actions. The agent action affects (change) the state of the environment. The machine receives a value of the transition state, which can be rewards or penalties. The goal is to learn a trajectory of actions that maximize the rewards (or minimize the penalties) over its lifetime. Reinforcement learning learns the optimal policy that models environment states and actions that will maximize (or minimize) its objectives [136]. In [127], SBSs prefetch popular content during off-peak traffic hours and send the contents to the edge of the network during peak period. The cache control unit in the SBS is designed to learn, track, and adapt to the work dynamics. The authors proposed an optimal online caching policy by developing Q-learning algorithm. The Q-learning scheme is introduced with a linear function approximation to offer fast convergence, reduce complexity, and obtain scalability over large networks 2. Deep Learning: Deep learning represents the form of learning that creates complex features by using multiple transformation steps. Much larger quantities of data are used during learning steps. Deep learning techniques show the ability to explore information included in massive data sets more effectively than traditional ML techniques. Deep learning implies learning complex artificial neural networks (ANN) that extract progressively patterns in the datasets. In traditional ANN, the three-layer perceptron (input, hidden and output layers) learns by training the hidden and output layers to adapt to the task of interest. In deep learning, more hidden layers are added to the network to subject features to the sequence of transformations. Each layer's transformation represents an inference. Modeling complex inferences can be made easier using the sequence of computational steps. The depth of the ANN represents the complexity of the learning algorithm. Some ANN learning algorithms include feedback loops. There is a number of ANN deep learning techniques such as deep multilayer perceptrons, deep convolutional neural networks (DCNNs), and recurrent neural networks (RNN) [137]. Authors in [130] proposed a deep neural network (DNN) to train an optimization problem for cache placement, user association, and content delivery in advance and before applying these optimization algorithms in real-time caching.

Conclusions and Future Work
In this paper, energy and latency efficient caching in mobile edge networks (MENs) are reviewed. MEN enables the use of caching capabilities at the edge of the network in macro base station, small base stations, and user terminals. Different caching techniques Regularized SVD K-means clustering [4] Proactively cache files based on file popularity and correlation among users. They exploit influential users in social structure of the network to cache strategic contents RSVD based CF and TL [72] Estimate the content popularity and improve the estimation accuracy Deep learning [120] Predict content popularity Extreme learning machine (ELM) [76] Estimate the popularity of cache content based on the features of the content Deep belief network (DBN) [121] Extract semantic information of user playback pattern Cumulative filtering [122] Predict the content popularity distribution ML on Hadoop framework [123] Estimate content popularit Clustering technique [124] Track the evolution of content popularity over time Clustering technique [125] Content popularity based users clustering Reinforcement learning [126] Enabling access points to learn the optimal fetching-caching decisions Q-learning [127] Learn, track, and adopt optimal policy Rank-Directed Sparse Bayesian Learning [128] Estimate content popularity Transfer learning [129] Estimate content popularity Deep neural network (DNN) [130] Proposed caching placement and content delivering optimization algorithms Bayesian learning and RL [113] Propose Bayesian learning method to predict personal preferences and reinforcement learning is proposed for the content placement algorithm Genetic algorithm [131] Proposed cache placement algorithm for hierarchical collaborative caching Fuzzy soft set (FSS) [132] Proposed fuzzy soft-set decision making for cache placement algorithm Q-Learning [127] Proposed an optimal online caching policy Deep Learning [130] Proposed a DNN to train an optimization problem for cache placement, user association, and content delivery 1 3 are presented and compared. Then the challenges that face the design of caching system in MEN are discussed. We propose to use decision, evolutionary, and learning theoretical approaches to solve these problems. MENs also enable complex computation to be done which allows deep learning techniques to be adapted in these networks to solve problems related to energy and latency constraints. Upon review of recent developments in the design of caching in MEN, we noted that there are several challenges in modelling and implementing caching placement, access and delivery at the edge of the network due to continuous changes in content popularity, user mobility, and number of users within the network. More challenges appear in caching at MENs due to high computation requirements of future applications that need to satisfy power and delivery time constraints with the quality of service requirements, improved network throughput, and reduced end-to-end and backhaul delay costs. Future research work is required to investigate the development of algorithms for cache placement, cache access, and cache delivery by utilizing the data storage and computing capabilities of mobile edge networks. The main focus is on using learning and decision techniques to implement the algorithms.
In future work, we need to investigate the impact of user mobility, user activities, and cell pattern on content caching that can minimize the latency for providing the requested content to the users while on the move. More investigations are required on the impact of previous behaviour (history of file requests, cache contents, user activities, etc) and learn what can minimize latency in future user requests. The aim is to find which files to cache at SBSs and UTs to maximize the cache hit rate taking into consideration users mobility, content popularity, and cache storage capacity. Also, we need to develop cache access and cache delivery algorithms to minimize the download time and energy consumption, respectively. An efficient solution is required to build a model that is able to learn the hidden features in the input data sets, features of system attributes and their relationships, the relationship between cache placement in previous decisions, and cache access and delivery decisions to predict next decisions that may improve overall system performance. The solution approach needs to balance between the computation time and the solution quality.