.

Monday, June 3, 2019

Data Processing in Big Data Centres Cost Reduction Approach

info Processing in Big Data Centres Cost Reduction get downA Cost Reduction Approach for Data Processing in Big Data CentersR. Reni Hena HelanABSTRACT- The tremendous development in cloud entropy touch on leads to the high load on computation, store and communication in the info storage core groups, which influence the entropy center providers to spend a considerable expenditure in entropy processing. There argon three features leading to this increased expenditure, ie., job allotment, information fix and info movement. In this written report, these three features ar taken into consideration and an approach for embody simplification for cloud entropy processing is proposed. I propose a Markov Chain Model to analyse the job completion considering the data transmission and its computation.Keywords Markov Chain Model, Data Center, Cloud data, Data Positioning, Data Processing.INTRODUCTIONIn recent years, the outburst of data all over the world has led to the demand o f data processing in the data storage centers. This demand further leads to the increase in the approach incurred in the computation and the communication re lineages. As predicted by Gartner, by 2015, 71% of the data storage center hardware utilization would be from the cloud data processing which will cross around $126.2 billion. So, it is of vital importance to analyze the cost reduction problem in cloud data processing in the data storage centers.Data Center resizing (DCR) has been proposed to reduce the cost conglome charge per unit in data processing by adjusting the number of activated servers through task posture1.The Cloud Data Service Architecture brinyly consists of distributed file systems which is accommodative in distributing the data and its copies all over the data centers for an efficient load balancing and high performance. Some studies focused on reducing the communication cost by taking steps to place data on the servers where the input data exist to solve th e remote data loading problem.Even though there were many solutions proposed to solve the above issues, none of the solutions were helpful in providing a cost efficient big data processing due to fewer disadvantages. First one, being the wastage of resources for the data that is not often accessed. Second, being the transmission costs involved depending on the distances and the type of communication used between the data centers. Not all the data could be stored on the same server because of its high volume it is a mandatory one to store few data into remote servers that would incur transmission cost. Transmission costs get increased proportionally with the number of communication links involved.To get rid of the above disadvantages, I consider the cost reduction for cloud data processing through a joint optimization approach of task placement and data positioning in the data centers. Every server may have only a few resources needed for for each one piece of data residing on it. The data will need much resources to carry out with its big data processing tasks. The main aim of this paper is to optimize the data positioning, task allocation, routing and DCR to decrease the overall computation cost involved. The contributions are briefed as follows,1.This paper considers the cost reduction problem involved with the cloud data processing in the data centers by the joint optimization of data positioning, task allocation and routing. To explain the computation and the transmission involved with the data centers, the Markov Chain model has been used and the task completion time has been derived.2. For cost reduction, three factors are taken into consideration. The first one is how to place data in servers and the second one is how to distribute the data and the third one is how to resize the data centers to achieve minimum cost operation.II. OTHER RELATED WORKSCost Minimization in ServerThe data centers are distributed throughout the world to store huge volumes of data that are accessible to thousands of users. A data center consists of a large number of servers that rot much power. Few Million dollars were to be spent on electricity cost that is a rising problem leading to the increased operation cost. The outdo known mechanisms proposed that grabbed attention was the DCR that focused on energy management by the data centers. Liu et al.2 examined the same issue by considering the delay with the network. Fan et. al 3 analyses on how much computing equipments can be hosted within a fixed power budget in a safer and an efficient manner.Data ManagementThe main aspect of data management is the reliability and effective data positioning. Sathiamoorthy et al. 4 proposed a solution based on expunging codes that offered high reliability in equation with the Reed-Solomon codes. Yazd et al5 proposed a scheduling algorithm to improve energy efficiency in data centers considering the data locality properties.Data PlacementAgarwal et al6 gave a dat a placement approach for the geographically distributed cloud services by considering the bandwidth cost, data center capacity, etc. It analyzes the logs based on the data access types and the client locations.All the be works either focus on the task allotment or on the data placement or on the data management. But this paper takes into consideration, the data positioning, the task allotment and the routing of data systematically.SYSTEM MODELThe geographically distributed data center topology is shown in Fig. 1. with all the data centers containing the same data are machine-accessible via switches. There are a set of data centers(D), and each data center d D that consists of a set of servers Sd connected to the switch md M having a local transmission cost of Cl .The local transmission cost Cl will be slight than the data center transmission cost Cr. Le the whole system be modeled as a Graph denoted by G=(N,E) where,N is the vertex set that includes all the switches(M) and the servers(Sd)E is the edge set.The weight involved with the edges are represented as,w(u,v)= Cr , if u,v MCl, otherwiseThe data stored in geographically distributed data centers are divided into a set of globs C. Each data orb c C has a size and its is normalized to the server storage capacity. For each chunk of data, there will be P copies available in the distributed system for the fault tolerance. c be the average task arrival rate requesting for chunk c.Fig. 1. Data Center topologyThe task arrival in each server is considered as a Poisson Process. If the task is distributed to a data center where the data chunk does not reside, it will take some amount of time till the data chunk gets transferred to that data center. Each task should be replied with a response time of R.PROBLEM FORMULATIONData Placement and Task allocation constraintsThe binary variable ysc is used to refer to whether the data chunk c is placed on the server s.ysc takes the value 1 if the chunk c is placed in the server s and it takes the value 0 if thechunk c is not placed in the server s.In any distributed file system for each data, there are P copies of data chunks stored and the data stored in each server cannot go beyond the storage capacity.Any server is termed as an activated one(as), only if there are data chunks stored onto it or else tasks assigned to it.Data Loading ConstraintsFor every data chunk c required by the server s, there are few external or internal data transmissions involved for which a routing procedure is devised.The Graph containing the servers and the switches is divided into three categories,1. Source Nodes These are the servers consisting of the data chunks2. Relay Nodes These nodes aim data from the source nodes and forward them to theother nodes based on some routing technique.3. Destination Nodes These are the nodes that are receiving the data chunks.Each and every destination node will receive the data chunks only if does not have a copy of it.Cost Redu ctionThe cost involved with the transmission of the data chunks could be minimized bychoosing the parameters such as the ysc ,as , c etc.PERFORMANCE EVALUATIONThe performance analysis of the joint optimization approach describes that the communication costsdecreased if more tasks and data chunks were placed in the same data center. Further increase in the number of servers will not affect the data chunk scattering among them. Increased requests lead to more activated servers and more computation resources and the joint optimization approach tries to lower the server cost. This approach balances between the server cost and the communication cost. When the delay requirement is very small, many servers are activated to provide quality of service. And the server costs decrease as the delay constraints increases. terminationThis paper explains the joint optimization approach of data positioning, task allotment and routing ofdata to reduce the overall operational cost involved with the data centers that are geographically distributed.This approach reduced the computational complexity considerably.REFERENCES1 L. Rao, X. Liu, L. Xie, and W. Liu , Minimizing Electricity Cost Optimization of Distributed Internet Data Centers in a Multi-Electricity Market Environment, in minutes of the 29th International Conference on Computer Communications (INFOCOM).IEEE,2010, pp. 1-9.2 Z. Liu, M. Lin, A. Wierman, S.H. Low, and L.L. Andrew, Greening Geographical Load Balancing ,in Proceedings of International Conference on Measurement an Modeling of Computer Systems(SIGMETRICS. ACM, 2011,pp.233-244.3 X. Fan, W. D. Weber, and L. A. Barroso, Power Provisioning for a Warehouse-sized Computer, in Proceedings of the 34th Annual International Symposium on Computer Architecture (ICA).ACM, 2007, pp.13-23.4 M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, Xoring elephants novel erasure codes for big data, in Proceedings of the 39th Interna tional Conference on Very Large Data Bases, ser. PVLDB13. VLDB Endowment, 2013, pp.325-336.5 S. A. Yazd, S.Venkatesan, and N. Mittal, Boosting energy efficiency with mirrored data stem replication policy and energy scheduler, SIGOPS Oper. Syst. Rev., vol.47, no.2, pp.33-40, 2013.6 S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman, and H. Bhogan, Volley Automated Data Placement for Geo-Distributed Cloud Services, in the 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2010,pp.17-32.7 S. Govindan, A. Sivasubramaniam, and B. Urgaonkar, Benefits and Limitations of Tapping Into Stored Energy for Datacenters, in Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). ACM.,pp.341-352.8 P. X. Gao, A. R. Curtis, B. Wong, and S. Keshav, Its Not Easy Being Green, in Proceedings of the ACM Special saki Group on Data Communication(SIGCOMM), ACM,2012.pp.211-222.9 J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton, Mad Skills new analysis practices for big data, Proc. VLDB Endow. Vol.2, no.2, pp. 1481-1492, 2009.10 H. Sachnai, G. Tamir, and T. Tamir, Minimal cost reconfiguration of data placement in a storage battlefield network, Theoretical Computer Science, vol. 460.pp.42-53, 2012.

No comments:

Post a Comment