1 Introduction

The rise in use of mobile phones, tablets, Personal Digital Assistants (PDAs), and internet based applications for services like news, online video streaming, gaming, and massive file transmissions has shown significant growth in internet traffic. The dependence over mobile internet data has been exponentially rising due to its widespread significance and frequent usage. The pandemic like situations of COVID-19 have further added to this dependence. The conventional way to handle these internet-based services is to rely on external infrastructures. However, the current demand for data, if equipped with these devices, would incur a very high cost of implementation. It requires a high financial investment, long process of development, low return value, and a high maintenance cost. There is an exponentially rising need for attention for mobile data traffic and its offloading alternatives.

As per the white paper report from [1], nearly 15% of the global population of internet users is expected to rise by the year 2023 than from the year 2018. The number of devices connected to IP networks is likely to be three times its global population. More than 70% of theglobal population is expected to have mobile connectivity with about four times rise in machineto-machine connections. The analysis forecasts that the global mobile data traffic would rise to five times to reach nearly 164EB per month by the year 2025 [2]. In order to reduce the traffic load corresponding to one junction, an optimal solution is to use ad-hoc networks. For such mobile data offloading, WiFi and Bluetooth technologies have been described as a promising solution [3]. Opportunistic communication [4] through WiFi based common hotspots serve similar objectives.

To identify offloading structures for such optimizations, the static opportunistic contact analysis is done [5,6,7,8,9,10,11]. However, the significance of continuously evolving networks is yet to be identified. The literature suggests comparative analysis of greedy, heuristic, and random approaches in [8, 12] for static target sets. It suffers the drawback of being unrealistic due to the dynamic behavior of the network graph. The majority of research work is focused on the identification of users, who behave similarly on some similar grounds. It is achieved through heuristics based TSS [13]. TSS is the procedure of selecting a limited set of nodes, which can share the duplicate data, which is otherwise meant for each node via access points. Such a solution is NP-hard [13] in order to approximate minimum or maximum participants for set identification. The associations are static and independent of time constraints, and the majority of the literature suggests modeling for fixed networks, which does not vary with time. However, we need to consider the opportunistic associations concerning time constraints.

We intend to optimize the answer to this NP-hard TSS problem in this paper and suggest the identification of an optimized target set to propose access point level offloading. Therefore, TSS detection can be activated to minimize the final set to give its customers cellular network offloading at different times as against [14] . As illustrated in Fig. 1, we focus to target a limited set of users within the range of the same access point. Unfortunately, the access point has to deliver similar content frequently to many users, which renders a bottleneck at a superior level and lays the foundation for data offloading needs. The problem is simplified by offering a differentiation technique of data offloading. The proposed solution includes optimization of a dynamic target set identification at three sub-levels. It consists of the initial selection of the target set, the portion of the secondary target set and the optimized selection of the target set for complex networks. Our significant contributions can be summarized as:

  • We aim optimal target set using three-phase optimizations for TSS limiting the users’ overlapping communities.

  • We guarantee the restricted membership of each node to minimize the possibility of conflicting groups in the event of different interests.

Fig. 1
figure 1

Offloading target sets for different interests

2 Literature survey

The majority of the literature related to mobile data offloading proposes several algorithms for feasible optimizations. However the research work is inter-related, but we try to categorize the literature survey broadly into three sub-categories. The research over data offloading is divided into the area of application, identification of data offloading parameters and study of TSS. So we review the existing literature for data offloading on its basis in the following subsections.

2.1 Types of ad-hoc connections

The most favorable strategy to migrate data traffic from cellular networks to device-to-device networks is by using Delay Tolerant Networks (DTNs). The limited capacity of the DTN based devices and the varied interest of user constraints for their limited storage has been studied in [15]. DTNs with real traces of humans and vehicles are transformed into maximization function problem using optimized 0-1 Knapsack with linear constraints. The authors have proposed a Greedy Algorithm (GA) for general scenarios, an Approximation Algorithm (AA) for shorter lifetime scenarios, and an Optimal Algorithm (OA) for homogeneous contact rates and buffer size. However, their work assumes the occurrence of contacts between any two nodes to be following Poisson rates. Thus the implementation is limited to such models related to DTN traces only. The topic of offloading maximization is also discussed in [8] using a greedy method based on heuristic documents. It requires the approximation of shorter-lived record associations. An optimal algorithm for heterogeneous contacts is proposed in it. Their work is compared with greedy and approximated algorithms in [16]. There is a proposal for an adaptive transporting algorithm based on Lyapunov optimization, which can release part of the application’s computing to a dedicated server and adapt to evolving environments as in [17]. Special consideration has been given to heterogeneous networks , making them more practical by categorizing users as helpers and subscribers. Kempe et al. approximated the upper bound for TSS [18] and Chen determined its lower bound [10]. The authors assume the users to mutually agree to share and get the same information on sharing basis. This could reduce their load for payment and also may share their resources to avoid congestion. In the case of Vehicular Ad-hoc Networks (VANETs), these Wi-Fi-based access points are assumed to be associated with the public mode of transport [19]. The users are assumed to be belonging to a limited community at similar time instances across the day intervals. This helps them to be considered on priority for all such networks. The offloading using the Wi-Fi-based access points proposed in [7] lowers the transmission cost. It is intended to be lower than the transmission cost via cellular networks. The suggested algorithm in [9] explores the network-based interaction between nodes and tests whether connectivity exists between them. Based on this association within a common transmission range, the authors categorize a few nodes as sub-network deterministic nodes. Even after finalizing these nodes, few nodes may fall in the range of more than one deterministic node. The authors have identified such scenario of multiple communities in [20]. In such cases, the final priority of a node needs to be determined on the basis of some significant characteristics. The authors have proposed a data forwarding algorithm namely Social Attraction and Infrastructure Support (SAIS) which uses the property of social networking for implementation. The major contribution of their work is the realistic addressing of small ratio of access points in comparison to the total number of mobile users. The authors have also addressed the property of graph cliques for network realization.

2.2 Parameters of data offloading

Few authors use encounter frequency within the same classroom to consider offloading the cellular traffic [11]. They have used social community and encounter-based frequency to analyze the performance of data forwarding. The authors have proposed data forwarding on the basis of encounter, Rarest First (RF) meeting, and random strategies and compared them. The findings show the comparative analysis is outperformed by the RF algorithm. The authors use the material latency of distribution in the RF algorithm to assess the effect of a set deadline for the inclusion of sources in the group. The findings illustrate optimization based on the balance of frequency and latency of offloading. The main consideration is the expectation of regularity of human mobility to demonstrate social engagement as a key factor of assessing the opportunistic discharge of mobile data based on contact. The problem has been proved to be sub-modular followed by the application of greedy algorithms and heuristics of human mobility patterns. However, the assumption of some content dissemination controller to decide which nodes content must be sent is also partial as in [21]

The literature includes [12] with different probabilities of system discovery to achieve the opportunistic communication of moving cell phones for short contact time. The comparative results in [9] show that the greedy algorithm is better than heuristic and random algorithms.. It derives a framework to exchange small data during short contact periods. In [5], there is a device model for the implementation of traffic offloading using motion predictions that uses the collection of different matrices to evaluate the neighboring node coverage zones and the likelihood of meeting. The likelihood of the meeting is used as the heuristic parameter for evaluating the coverage relationship in the graph-based coverage calculation. For the ad-hoc network, the network was simulated over the ns-2 network simulator. The majority of the research work considers the use of online social networks to identify the social participation and activity status using social networks like Facebook or Twitter in [21]. [22] suggests acceptable worst-case solution using tree-based transformation of static graphical network. The authors propose the Adaptive Finding Overlapping Community Structures (AFOCS) algorithm comparing it with C finder [23] and COPRA methods [24] for dynamic group detection. In order to locate the local community and then merge the overlapping populations, their initial work enforces the relationship under the basic community structure. The group optimization AFOCS algorithm causes new nodes to be inserted and removed. It also controls the addition and removal of them into the group. The authors in [25] have demonstrated the correlation between space-crossing community detection and its influence in data forwarding in mobile social networks. They have proposed a data forwarding algorithm namely Social Attraction and Access Point Spreading (SAAS) to improve data forwarding efficiency addressing the properties of delivery ratio and delay.

2.3 Target set selection problem

The issue of determining which nodes are more helpful to speed up the dissemination process has been aimed at TSS [6, 21]. Reinforcement learning based approach of actorcritic modeling is suggested for solving TSS in [21]. The results for the actor-critic approach and Derivative Re-injection to Offload Data (DROiD) [6] are compared in opportunistic networks for single community scenario and multiple community scenarios. The researchers also targeted the issue in order to define the cap on the number of nodes to be allocated to disseminate the material. In order to define the heuristic behavior and evaluate the nodes that are more useful for spreading the material, the acknowledgment message with additional information is used. It is achieved using the learning principle of temporal difference. It uses the content distribution stage, the ratio of nodes used to the nodes available, and the percentage of time remaining until the panic zone for content delivery. In [26], VIP delegation is suggested on the basis of the social dimensions of user mobility. The authors use frequency of meetings as the significant attribute to determine the strength of social ties. The VIP promotion techniques have been classified using random and greedy approaches into Blind global promotion and Greedy global promotion techniques respectively. The authors have used the attributes of between centrality, proximity centrality, degree centrality, and page rank to determine the social strengths for VIP neighborhood selection. the authors define the value of a node in the graph. Small nodes are identified in few research studies to send the same data on the basis of certain deterministic characteristics to the maximum number of neighboring node [27, 28]. The major issue is that the access point would have to provide some equal or unequal offering for all the users irrespective of their unequal significance [8, 9, 11]. Thus incentive determination becomes significant. The major drawback of the present literature is the ignorance of continuously evolving network composition and limiting the analysis of static network-based communities and less attention to overlapping communities. This can be achieved by the collaborative effort of users along with their network service providers, which we utilized using tree-based graphs in our scenario. We have tried to merge the greedy and overlapping community approaches to the dynamic level.

3 System model and assumptions

In this section, we summarize the model and the assumptions laid with the inclusive definitions and notations explained. Our model is a hybrid of basic mobile data network along with the associated opportunistic network supporting the infrastructure based requirements for efficient mobile data offloading.

3.1 System model

We assume a network of mobile users with different interests within the range of an access point to belong to a single community initially. For simplicity, we presume all users have the same capabilities, at least the minimum of all. All users are mobile and interconnected using wireless links. Also, before its Time-to-live (TTL) expires, we have to consider that the data transmission is efficient. TTL is the data item’s transmission date regardless of download or upload. Our aim is to derive minimal but optimal subset of users. Our system model comprises of a group of communities represented by the set of nodes or the Mobile Users (MUs) for a specific service represented by edges in between. The dynamic behavior of MUs has been observed in the data set pre-processing in reference to different time instances. As illustrated in Fig. 1, since one or more nodes are connected to some other nodes in similar communities, we may try to offload the traffic to it from the access point. In this section, we summarize the model and the assumptions laid with the inclusive definitions and notations explained. Our model is a hybrid of basic mobile data network along with the associated opportunistic network supporting the infrastructure based requirements for efficient mobile data offloading.

We consider a service, such as Sports News or Weather Update, subscribed to by n users in the range of the access point and transmitting the relevant data to the whole population. Thus, the overall network load handled by the access point is calculated as the product of the number of nodes and the corresponding individual load for each node. We may need to find out the heuristic pattern of k users within the range of access point S. This helps to find out the dynamic patterns of these nodes. Another way out could be to find relations between these nodes to obtain subsets using certain characteristics of similar subscriptions. We reflect C[i] as the cost of these data transmissions through the access point of the cellular network and c[i] as the cost of these transmissions through Wi-Fi hotspots found in the immediate vicinity. In terms of data bytes, the expense is observed. The data record has to be kept by the cellular network’s access point. By taking the ratio c[i] : (C[i]), the improvisation can be determined. Our aim is to identify every neighboring node j for each node i and assign the matrix attribute If(i)j = 1. The value of If(i)j is unity if node i is in direct contact with node j, and is otherwise 0. In general, the network nodes are divided into different classes and our focus of observation is focused on a single S community based on one form of subscription. In a larger subset, this is determined as a local target set range. We refer to this as the Optimal Goal Set (OTS) derivative based on the values of \(SI_{n_{i}}\), \(BI_{n_{i}}\) [29] and \(Depth_{n_{i}}\). The availability of users within the connectivity spectrum of data access via Wi-Fi ensures data offloading in real life. In addition, it is also time-bound for the period for which the consumer is in the deliverable range. In addition, there is a fixed size of the content to be shared across these data access points. Using a tuple-based offloading incentive function Sni = [α,β]. The lower bound of α is 0. The value of α defines the length of the node i within the Wi-Fi hotspot proximity range. The value of β = [0/1] indicates the possibility of the Wi-Fi based connection for the downloading service depending on the cost of data entry and outflow. The capacity of the transmission is determined by using the speed and the time that the consumer stays connected. The final set is achieved by the heuristic greedy method. The optimal derivation S contains \(k^{\prime }\) nodes in such a way that \(k^{\prime }\ll k\). The list of notations and symbols used in this paper are enlisted in the Table 1 below.

Table 1 Description of notations used

3.2 Definitions

In order to understand the system model we need to define the following terms at first:

Definition 1

Community Selection: Traditionally, the term Community is defined as a group of users who have a common belief or behavior to ensure that they are tightly knit nodes, with more internal links than external links [20]. Based on this definition, we use the term Community initially to identify users within the range of one access point. Thus St = [n1,n2,..….,nk] is the initial community of users. We determine the sub-communities also on the basis of common user interests as in [30] to ensure strong internal links. We define a small subset \(S^{\prime }\) of S nodes that could be targeted to deliver data based on short-range communications made only at the user level to the entire collection S. The optimal set and final target set are derived from it. After the sub-community determination, when one node is selected from the major superset St for content delivery of interest item i, it is also termed as a Community. Thus we have used the term Community interchangeably, in reference to the interest-based subgroups for the users and significant offloading users for the access points.

Definition 2

Overlapping Community: Based on the earlier definition of Community itself, when the access point observes a user to belong to more than one user group then, then we use the term the Overlapping Community. In other words, we say if a user is selected for more than one interest item, then the communities may overlap among themselves. Considering the user ux being interested in item ia and for item ib also, we have the user ux to belong to more communities a and b, simultaneously.

3.3 Assumptions

We have assumed that all the users within the range of one access point belong to the same community. Also, the classification of all nodes in any set is on the basis of a similar category of subscribers for the same service. All nodes are ready to replicate the similar interest data within a defined time limit. Each community has dynamic interactivity based interconnections. Every node is also presumed to agree to share its list of neighbors with their interests in the form of Summary Vector components similar to the cache enabled scenario as in [31].

4 Problem statement

Our problem is to identifying a subset of users belonging to the same or different community and discharging the data which is otherwise intended for the access point of the cellular network. The problem of data offloading relies on selecting a limited set of nodes and then forwarding the data to its identified target subset. The major goal remains to select a subset of vertices in the graph which in turn could satisfy some other vertices on the basis of some common attributes. This objective is achieved by the identification of common subscription-based communities, followed by targeting only limited nodes from the identified subset which belong to the same community over a fixed span of time δt and hence is dynamic. Our model focuses to prioritize nodes for an optimized subset in case of overlapping interest-based scenarios. Earlier to our work, the literature suggests social network-based static community derivation. We do also address the overlapping communities by limiting them to belong to one user interest at a time. In order to belong to one subset of \(S^{\prime }\), we accept groups dependent on the same facilities. The solution to this problem is subdivided into different stages of recognition and subset selection optimization. In order to derive the complete route for the data packet to reach through the maximum number of nodes, the neighbor prediction for each set is followed. Several authors mentioned in the literature also encounter the same form of problem [4, 8, 10, 11], but the approach proposed typically suffers the disadvantage because the model is static and partial or predetermined. It becomes more unrealistic because when we consider the offload for mobile data subscribers, the users are mobile in our model. The relationship is extracted from some of the solutions based on experiences in history that establish a predetermined connection. Taking into account these limitations of previous work, we seek to achieve dynamic allocation [17, 32] for each goal set. Centered on changing the allocation of nodes to many target sets, we suggest a more complex algorithm. We exploit two feasible optimization scenarios: firstly, the target set selection within a single community is restricted to the selection of subscribers to one service and secondly, the offloading of neighbors across different communities. This model of network-based graph is similar to the assignment problem of a knapsack. In order to represent a set to be belonging to a community or not, we declare every matrix entity to be unity if it lies in the community and zero otherwise. Consequently, the aim of this study is to define the number of initial users to be identified as the primary and secondary target sets. This helps to derive optimized target sets. The level of interactivity in the overlapping population needs to be detected after the selection of the optimal target sets. This is done to define the path via the optimized target source set.

5 Proposed algorithm

We divide the entire procedure into three sub-algorithmic steps to yield the optimized final target set. The first sub-algorithm provides the nodes for the primary target set which are further optimized to a limited target set in the second sub-algorithm. The final sub-algorithm determines the route optimized data forwarding scheme for data offloading.

Algorithm 1 We start with the Primary Target Set (PTS) identification algorithm. This algorithm aims to use a limited set of nodes from a single community on the basis of similarity index values and the optimum threshold values for the nodes within the sets. The nodes are checked for their similar choices of subscription derived based on set of interests. The nodes which fall in the range of the access point at time instance t are identified and compared with the nodes at instance, t + δt. For each node, ni all the neighbors m are identified. Corresponding to these neighbors, we evaluate the similarity in data subscription as in [33]. We evaluate Betweenness Impact \(BI_{n_{i} }\) for each node ni, using betweenness centrality \(BC_{n_{i} }\) [29], and its total number of neighbors \(\rho _{n_{i} }\) using

$$ BI_{n_{i}} = \frac{BC_{n_{i}}}{\rho_{n_{i}}} $$
(1)

Here the value of betweenness centrality \({BC}_{n_{i} }\) for the node ni is given by

$$ BC_{n_{i}} = \sum\limits_{j=1}^{N-1} \sum\limits_{k=1}^{j-1}\frac {g_{jk}(n_{i})}{ g_{jk}} $$
(2)

where gjk(ni) identifies the number of paths between nodes nj and node nk passing through the nodeni. Here gjk is the total number of paths connecting node nj and node nk. For each node and its immediate neighbor, we evaluate the similarity in data for them. We also use the influence function similar to the k-truss used by the authors in [34] . It ensures maximum usage of cut vertex-based nodes. It is based on the concept of influence sub-graphs in graph theory. According to it, a K-influence subgraph ni(K) for a graph G is defined as the largest sub-graph with all interconnected edges belonging to at least K − 2 triangles. Thus each edge for the node ni has an influence value, ni(ij) = K, if it belongs to ni(K) but does not belong to ni(K + 1). This is equivalent to the clique of order K. The nodes are ranked on the basis of Influence Size, \(INS_{n_{i} }\) derived using

$$ INS_{n_{i}}= max |{n_{i} (ij)}| $$
(3)

It helps to derive the maximum impact for different influence values among all nodes in the graph.

We derive the fraction in the influence \(F=R_{W}^{K_{m}ax}\) for a variable size window W, using the following equation

$$ {R}_{W}^{K_{max}}=\frac{\text{Number of nodes in } n_{i} (K) /|S|_{k_{max}}}{ |W|/|S|} $$
(4)

The final preference for edges is determined on the basis of combined value of EgoBetweenness and Influence for each node, which is calculated using

$$ Utility_{n_{i}}=(\alpha \times F \times INS_{n_{i}})+ (\beta \times BI_{n_{i}}) $$
(5)

Here α and β are tunable constants with α + β = 1 to give priority. The ni node is chosen for inclusion in the PTS set based on the maximum value of \(Utility_{n_{i} }\). The same protocol is updated for BI ≥ 0.5 nodes. This ensures the priority given to the impact of similarity using tunable constants. We divide the set of users into two halves based on the value of BI, with the values to be either BIł 0.5 or BI ≥ 0.5. Although the complexity of this algorithm is more, yet it is responsible for the major improvisation achieved in our results.

figure d

Algorithm 2 The target set is optimized further when we optimize the sets on the basis of interests prioritized by the nodes. We aim to identify the nodes which should be preferred more over the rest of the nodes, similar to the social network connections on the basis of frequent interactivity in terms of activity status governed by the access point. These nodes are referred to as the optimal nodes. The data is replicated to its neighboring nodes by them. An access point needs to prioritize a small number of nodes in any group in which a user encounters V neighbors for data retrieval using several edges E. The ONS algorithm does this identification of neighbors. It uses the breadth-first search and depth-first search approaches to determine the set of progressive nodes. Such nodes share the utility values from the previous algorithm to their neighbors. The compressed message carrying two summary vectors is expected to carry all nodes: summary vectors SVI and SVII. The SVI includes each node ’s list of subscription interests, and SVII stores the data in compressed form. For the adjacent node, the data are given on the basis that the subscriptions across the summary vectors are equivalent. If the initially available data is relatively low than the data in the main node, the data is transmitted to the nearest node. Based on a similar form of subscription, we continue to classify individual populations. This problem is defined as optimum neighbor set selection. In this algorithm, we propose to share the data to the neighboring nodes in the form of summary vectors. It considers the nodes which have multiple belongings to more than a single communities identified through the channel. We determine the overlapping on the basis of its matrix representation for each matrix containing a node that belongs to more than one community. The weights of common interests across the interest matrix ensure the selection of non-overlapping communities.

figure e

Algorithm 3 We transform the network visualization into a tree data structure. This helps in avoiding any cycles and reduces the number of directed connecting links. Such a transformation helps us to impose the phenomenon of shortest path application across the minimum spanning tree using the depth attribute of each node. In order to evaluate depth, the graph users need to be connected. Thus the row matrix sum is used to obtain the depth based relation. The maximum depth evaluated using rowwt(ny) determines the maximum utility of minimum number of nodes associations with lesser delay tolerance and assuring maximum portion of the network covered. The nodes with the best available ad-hoc approach are selected.

figure f

Complexity analysis for algorithm

The target set selection is an NP-hard problem [13, 35] to approximate the maximization and minimization variants. The PTS algorithm has a complexity of order i × int. The number of interests int, are far less in comparison to i number of users. Hence we can consider O(i × int) ≈ O(i). Also, the second algorithm is dependent on the number of neighbors of nodes. The number of primary neighbors for any node are also very less in comparison to the total number of users. Hence for ONS algorithm, the complexity can also be approximated to be of the order O(j). The final algorithm FTSS is dependent on the output of the previous PTS and ONS algorithms. ONS is repeated for each user identified from PTS. Hence the overall order for FTSS is (i × j). However, the average number of neighbors is also very less in comparison to total number of users. Hence we may consider O(i × j) ≈ O(i). Thus the overall complexity of our algorithm is linear for practical reasons.

6 Simulation and performance evaluation

The proposed optimization is compared with literature strategies for its implementation using MATLAB. Our results are authenticated for data forwarding in case of limited sizes of the target sets involving more significant nodes. The simulation is evaluated over reality mining dataset from MIT and bluetooth dataset from NUS. These datasets have been used to identify the social communities and groups on the basis of the identification of bluetooth enabled devices in the proximity for static and dynamic associations which evolve with time. In this section , we present the simulation results of the greedy heuristic community-based algorithm and compare them with the naive FTSS algorithm. We consider a scenario of transmission of a fixed size message of 10 Kbs for our purpose of simulation. Much like newspaper distribution by a hawker, each delivery of data packets consists of a single packet. The goal is to determine the most effective target set to unload cellular data on the basis of opportunistic communications that are available at various times and then route the data packet through it.

6.1 Traffic load comparison

We have considered only 1000 nodes from the MIT dataset in the simulation setup initially. We compare the literature based algorithms with our algorithm for a fixed size target set with an upper bound of 50 nodes. We considered a 20-second time limit for each subscriber to retain and exchange the data with their neighbors. Otherwise, the network access point would send data to all nodes in its range automatically after 20 seconds. For more users, we have repeated the same method from 1000 to 5000 in the reach of cellular networks as shown in Fig. 2.

Fig. 2
figure 2

Traffic load over cellular network access point for MIT dataset

Our algorithm is used to estimate the proportion of users who can access data by means of the limited number of users in the targeted sets. FTSS based goal set selection is found to be more optimal than previous algorithms. If more subscribers are interested in having the same data, our algorithm provides optimal results for the optimal percentage of users. As the number of mutual interest subscribers increases, the percentage of happy users using FTSS increases. The rationale for this improvisation is the possibility that subscribers would have more chances in a wider opportunistic network to contact others. In this simulation, 800 nodes from the NUS dataset are studied and the findings are shown in Fig. 3. We pick a 10 Kb data packet to be sent to all these nodes with a choice of 10-100 node goal sets of different sizes. Initially, the entire traffic is handled by the access point itself when there is no subscriber in the target range. Therefore, 800 × 10 = 8000 Kbs of data must be transmitted. The amount of traffic managed by the access point is reduced as we encourage more users to assist in offloading as target set users.

Fig. 3
figure 3

Traffic load over cellular network access point for NUS dataset

6.2 Data offloading comparison

Figures 4 and 5 illustrate the extent of offloading percentage, which rise with the increase in the number of subscribers from respective MIT and NUS traces. We increase the participation of subscribers from 1000 to 5000 and observe nearly 20% more data offloading in comparison to literature based algorithms for MIT dataset. However, in simulation over NUS dataset we observe 10-25% more data offloading for similar extent of contribution of subscribers.

Fig. 4
figure 4

Comparison of data offloading phenomenon over MIT dataset

Fig. 5
figure 5

Comparison of data offloading phenomenon over NUS dataset

6.3 Average latency comparison

We depict the latency observations in our simulation using Figs. 6 and 7. We observe that average latency is also reduced using FTSS algorithm based implementation for both datasets. The average latency is also reduced nearly 10-12 milliseconds for varying sizes of target sets. As we go on to increase the sizes of target sets from 100 to 1000, although the average latency is reducing. But the results using FTSS shows less latency in comparison to literature based algorithms.

Fig. 6
figure 6

Impact of using FTSS over latency for MIT dataset

Fig. 7
figure 7

Impact of using FTSS over latency for NUS dataset

6.4 Performance gain comparison

At last we check the performance of our algorithm for different message sizes. For fixed latency of 20 milliseconds, the overall performance gain reduces. The results for MIT and NUS datasets have been shown in Figs. 8 and 9 respectively. The results in Fig. 8 show that for a message size of about 50 Kbs, there is nearly 20% performance gain. However, the gain is less for smaller as well as larger message size for MIT datasets. Similarly the results in Fig. 9 for NUS dataset, we obtain the best optimal size of message size of about 40 Kbs. However, the results show less performance gains for smaller as well as larger message sizes.

Fig. 8
figure 8

Performance gain comparison for MIT dataset

Fig. 9
figure 9

Performance gain comparison for NUS dataset

7 Conclusion and future scope

Instead of completely using the access point resources of network providers, minimization of data traffic using inbuilt service capacities of the users, yield optimized results for data routing. PTS sub-algorithm has the complexity of the order of O(k2). ONS algorithm has O(k3) complexity whereas the FTSS has O(k2) complexity. Thus we have provided a heuristic based hybrid solution of O(k3) order, with a limited set of constraints. The approach assumes that all users are ready to share their identity and interests with the access points for cooperation. Also, every node has similar information about its immediate neighbors. This lays the foundation for the determination of optimal target selection in opportunistic networks such as VANETs or DTNs. Analysis of our results shows that the hybrid FTSS algorithm outperforms the greedy approach by 35% in terms of traffic offloading over cellular towers, 20% less as compared to the heuristic approach, and 23% less average latency when compared to the community-based algorithms. The algorithm yields at least 5-6 % less offloaders in the target sets in comparison to the heuristics-based networks. Since all nodes in the network may or may not be trustworthy amongst a network. Thus the impact of trust determination for such an evolutionary network has been excluded in the current work which will be explored in the future. The vehicular hotspot based access points and determination of incentives for each of the helper, nodes are the future orientations for research in this area. The delay tolerance intervals can also be varied which could be considered along with the determination of incentives for helpers to offer services using them. Our results of optimization render efficient usage of users and reduction in data traffic. The overall load in limited geographic scenarios is minimized using our modeling and implementation.