Published Online August 2020 In IJEAST ( Ijeast .

2y ago
9 Views
2 Downloads
352.92 KB
6 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Ryan Jay
Transcription

International Journal of Engineering Applied Sciences and Technology, 2020Vol. 5, Issue 4, ISSN No. 2455-2143, Pages 610-615Published Online August 2020 in IJEAST (http://www.ijeast.com)FAULT TOLERATING MECHANISM INDISTRIBUTED COMPUTING ENVIRONMENTLokendra GourDepartment of Computer ScienceAKS University, Satna, Madhya Pradesh, IndiaDr. Akhilesh A. WaooDepartment of Computer ScienceAKS University, Satna, Madhya Pradesh, IndiaDistributed systems are kinds of software system whichexchange bits and bytes among various computing nodes [3,5]. It provides infrastructures and services to the cloud usersand both small and large business enterprises. The Cloudsystem is playing a very important role in our society in termsof sharing fundamental computing resources. Reliability andavailability must be the prime priority of the cloud system. Toachieve this objective the cloud system must embody the faulttolerant infrastructure in its core system. Adopting a faulttolerating sub-system, “Kochar et al. (2017)”, in a cloudenvironment allows the cloud system to function its targetedoperations smoothly, even at a low-level efficiency “Programs running on a centralized uniprocessorsystem are capable of tolerating faults due to the existence ofmany powerful solutions [6-8]. In contrast, programs runningon a distributed computing environment with multiple multicore processors face the greater challenges of faults andfailures. Fault tolerance can be categorized as proactive faultmanagement and reactive fault management, “Patil et al.(2011)”. This paper gives a survey of pieces of workperformed on the fault tolerance mechanism in a distributedsystem with a focus on the machine learning-based approach,“Hazan (2016), He et al. (2016)”.Abstract— Large scale distributed systems encompassheterogeneous computational machines, workloads andsub-systems dispersed diversely across the cloudenvironment. These sub-systems frequently encounterfaults and failures due to different data structures,hardware/software malfunction, and communicationdelay. To speed up computation in such a situation a faulttolerating infrastructure is implemented by adopting amachine learning approach. Under machine learning, anartificial neural network (ANN) captures, manipulates,and updates the states and behaviors of the sub-systems inthe servers and worker's machines. Multiple layers ofneurons (i. e., deep learning) can handle large scaledistributed systems with large datasets. Adopting thevariants of a stochastic gradient descend algorithm on subsystems (also known as computational nodes) theefficiency, and reliability of a distributed system areenhanced significantly. In high-performance computing(HPC) applications fault tolerance mechanisms must beembedded to recover from system failures.Keywords— Distributed System, Cloud Environment, FaultTolerance, Machine Learning, Artificial Neural NetworkII. ANALYSIS OF DISTRIBUTED SYSTEMI.INTRODUCTIONDistributed systems may be homogeneous, or heterogeneouslike Grid and Cloud. Several shortcomings occur in such typesof systems, like the quality of service, resource selection, loadbalancing, and fault tolerance. Fault tolerance is a majorconcern concerning the design of distributed systems“Engelmann et al. (2009), Kakade et al. (2012) highlighted thedesign of distributed systems”. Whenever the failures occur inthe software system, it causes a partial or an entire breakdownin the operational system and we refer it, as a fault [9-10]. Toallow the system to execute its functionalities, even in theoccurrence of these faults, some sophisticated techniques mustbe implemented to tolerate the faults “Swartz et al. (2014),Zinkevich (2003)”. The objective of these techniques is todetect, identify, and correct the errors. This paper introducesan overview of the basic framework of distributed systems andtheir associated failure types “Hatcher et al. (2018), Chen et al.(2016)”. Java offers more options to realize distributedapplications.At present, the size of available data for training deep modelshas increased significantly. Exchanging the model parametersincreases the communication overhead which causes thebottleneck problem in a distributed learning algorithm. Forinstance, calculating the sparse on the gradients to zero-out thenon-important values will reduce the communication bit-rate.One of the great challenges in distributed computing is toidentify the faults and failures that become the sever cause offailure of the system. There are so many algorithms areavailable to handle that problem, but most of them are notappropriate for large scale distributed systems. The machinelearning model can manage such kind of problem easily andappropriately.The Cloud computing system “Calheiros et al. (2009)suggested the cloud platform” has become the most versatilesystem under the umbrella of a distributed system. Cloudcentric applications are multifaceted multi-componentsoftware which can exhibit rich and complex behaviors [1-2].610

International Journal of Engineering Applied Sciences and Technology, 2020Vol. 5, Issue 4, ISSN No. 2455-2143, Pages 610-615Published Online August 2020 in IJEAST (http://www.ijeast.com)From the Java perspective, the bottom layer is representedby sockets. A socket facilitates transmitting un-interpreteddata streams from one computer system to another. All othersbuild on this mechanism. Java's java.net package provides theinfrastructure needed for the direct use of sockets. From theprogrammer's point of view, another abstraction is moreappropriate: sending messages to remote objects. Thistechnique is called Remote Method Invocation (RMI) and is apart of Java's java.rmi package. Despite this, RMI is limited toJava1 and you have to know the location of a remote object orthe registry's location.The Jini(Java Intelligent Networking Infrastructure) manifestsa basic structure which provides, register, and obtainsdistributed services associated with its specification. A Jinisystem has of the following parts: A set of components that provides the basicinfrastructure for federating services in a distributedsystem A set of programming model that enhances theproduction of reliable distributed services The Jini technology infrastructure is centric to Javatechnology. The Jini sub-system of Java gains its accessibilityby considering that the Java programming language is thelanguage for potential components.III. OVERVIEW OF CLOUD COMPUTINGThe Cloud indicates to a Network or the Internet. In otherwords, the Cloud computing hierarchy provides variousservices over private and public networks, i.e., LAN, WAN,MAN, or VPN. Applications such as e-mail, customerrelationship management (CRM), and web conferencingexecute on a cloud platform. Cloud computingprovides platform independence, as the software is notrequired to be installed locally on the PC at users’ end. Nowthe Cloud system is creating and boosting our businessapplications [11-12]. Cloud Computing refers to organizing,manipulating, configuring, and accessing the software andhardware resources remotely. It offers on-demand online datastorage, infrastructure, and software services “Yuan et al.(2015), Zhu et al. (2017) and Ujjwalkarm (2016)”.Cloud computing is extended under the scaling ofdistributed computing. The Cloud system, “Nielsen (2018),Blanchard et al. (2017) and Li et al. (2014) presented ces,infrastructure, and services. Various technologies are availableto contribute to Cloud Computing. Some of the state-of-the-arttechniques are: Virtualization technology: Virtualization refers toexecuting multiple virtual computers or virtualmachines into a single physical machine. Cloudvirtualization is a technique for creating a virtualplatform for an operating system, storage, network,data, and server “McMahan et al. (2017)”. Virtualmachine techniques, such as VMware, and AWSoffer virtualized computational infrastructures ondemand [13-16]. Virtualization “Shaw et al. (2017),Singh et al. (2003) highlighted the virtualization”, isthe basic framework for cloud computing.Coordination of cloud nodes: For smooth functioningof Cloud Computing there must be propercoordination among the various computing nodes“Bokhari et al. (2016) attempt to address cloudcomputing services”. In the cloud system, everysmall cloud also known as cloudlet shares computingrecourses. These cloudlets run various ation must be implemented among thesecloudlets for the smooth functioning of cloudcomputing “Chen et al. (2012), Dong et al. (2011)”.Web service: Computing Cloud services are normallyexposed as Web services, which follow the industrystandards such as WSDL (Web Services DescriptionLanguage), SOAP (Simple Object Access Protocol),and UDDI (Universal Description, Discovery, andIntegration). WSDL is a protocol for exchanging orsharing information in a distributed computingenvironment [17-20]. It is an XML-based language.SOAP is a protocol for exchanging information overthe Internet. UDDI protocol is applicable . Amazon Web Services (AWS) “Garzonet al. (2008) proposed network based process”, is aform of web services offers various IT services in theglobal market. AWS technology is constructed viaserver clusters spread all over the world.IV. FAULT-TOLERANT APPROACHES IN CLOUDCOMPUTINGThe objective of creating a fault-tolerant system is to preventfaults arising from a single point of failure, ensuring the highavailability and business continuity.Cloud computing offers numerous services andvarious computing resources via the internet “Gomez et al.(2006)”. On the service provider’s side, a data center (DC)provides facility to keep computer systems as well as ninterruptible power supply, etc.Primary Backup Replication (PBR):611

International Journal of Engineering Applied Sciences and Technology, 2020Vol. 5, Issue 4, ISSN No. 2455-2143, Pages 610-615Published Online August 2020 in IJEAST (http://www.ijeast.com)Primary backup applies several replications to enhance systemreliability. Active replication does not assign any replica as theprimary replica, so it removes the centralized control ofprimary backup. All replicas receive the system’s activationand then reply to the result. So it sustains a high cost forkeeping all replicas synchronized. The fault-tolerantcontrolling system generally replicates the constituentcomponents to recover from the failure “Guo et al. (2008)”.Primary-backup replication protocols are very common indistributed computing [21-22].Check-Pointing:The Checkpointing technique provides fault tolerance for adistributed computing system. It saves a snapshot of theapplication program state, therefore application can resumefrom the point where the fault occurred. Checkpoints must becoordinated for recovery from the faults and obtaining optimalstable storage requirements “Li et al. (2015)”.There are two kinds of checkpointing: Coordinated UncoordinatedIn a coordinated checkpointing scheme, the processmust confirm that their checkpoints are consistent. It isachieved by two-phase commit protocol algorithms. It has twoadvantages: 1. Recovery is simple and 2. Garbage collection iseasy. It has some major drawbacks: 1. It is expensive due toenergy consumption 2. All processes are competing forwriting their checkpoints at the same point.In the uncoordinated checkpointing protocol, there isno requirement for synchronization between the processes atcheckpoint time [31]. It has some major drawbacks: 1. If nocheckpoint forms a global state, the application has to resumefrom the starting of the event of a failure. 2. The recovery costis not acceptable and 3. Garbage collection becomes morecomplex to implement.Message Logging:Message logging protocol is used for building a faulttolerating system. The message logging scheme is applicablein the model of message passing distributed system. Thispolicy registers custom messages. Most users exploit themessage logging facility because of its usefulness foranalyzing network simulations [23]. In this scheme, eachmessage received by a process must be recorded in themessage log and the process’s state is saved as a checkpoint“Jialei et al. (2016)”. The logged messages are saved properlyto recover the system from faults or failures. In highperformance computing (HPC) every process logs all themessages sent to any other process. It creates potential storageoverhead. This scheme works as: A request is forwarded to theAPI then the messages are registered. After that, the APIresponse is returned and finally, the message appears on theapplication log “Kalyani et al. (2016)”. There are two kinds ofmessage passing protocol: Pessimistic Message Logging Protocol Optimistic Message Logging ProtocolA pessimistic message logging scheme is the synchronousevent logging scheme. In the pessimistic message loggingprotocol, each message is recorded in the machine’s localmemory [41].An optimistic message logging system guarantees toobtain the recoverable system state. However, it has adrawback that it is less efficient than a pessimistic loggingscheme.K-Modular Redundancy (KMR):KMR is a widely used fault tolerance mechanism in softwareengineering. KMR is a kind of version programming, hence itis also known as N-Version Programming (NVR) [32-36]. It isbased on the principle of function ranking. Higher Kthsignificant functions are recognized and selected forinvocation. This strategy performs parallel executions that arefunctionally equivalent and then take priority voting tocalculate the final output. Triple Modular Redundancy (TMR),a kind of KMR, is a fault tolerance form. In which threesubsystems execute a process and the final result is obtainedby the majority voting subsystem [37-40].The advantage of this system is if any one of thethree subsystems fails, the other two subsystems can correctthe error and mask or remove the faults. TMR system containsthree similar logic circuits to compute the basic Booleanfunction. The output is obtained by combining the threeintermediate results by using another logic circuit [41-43]. Theconcept of TMR can apply to many forms of redundancywhich are found in many fault-tolerant computer systems“Patidar et al. (2011)”. The TMR is used in space satellitesystems.Scheduling:Scheduling is a decision-making process that a distributedsystem incorporates to determine the execution order of theavailable resources [23-26]. Scheduling is important formanaging incoming task requests and determining which taskto execute next. Scheduling is also one of the techniques totolerate fault in a distributed system “Lebiednik et al (2016)”.It is used to reduce the drawback of check-pointing in adistributed environment. It is categorized as time-sharingscheduling and space-sharing scheduling. There are threeapproaches to scheduling such as space, time, and hybrid [2730].612

International Journal of Engineering Applied Sciences and Technology, 2020Vol. 5, Issue 4, ISSN No. 2455-2143, Pages 610-615Published Online August 2020 in IJEAST (http://www.ijeast.com)V. CONCLUSIONThe objective of fault tolerating a distributed system is to makea distributed system capable of defending against the faults andfailures. Fault tolerance strategies are very crucial in thedistributed system, especially cloud-centric applications. Inlarge scale distributed system failures lead to the collapse ofthe entire system. A fault may occur at any constituentcomputational node or machine. This becomes the cause of apartial breakdown in the system therefore the throughput andperformance of the system degrade severely.[5][6][7]A plethora of research has been going on thedirection of the fault-tolerant system. Recently machinelearning especially deep learning is emerged as a promisingapproach to enhance fault tolerance in the distributed system.By principle, a deep learning approach incorporates multipleprocessing units to handle voluminous heterogeneouscomputing resources scattered over distinct geographicallocations. A distributed deep machine learning algorithm hasbecome a promising approach to implement fault-tolerantsystems.[8][9][10]VI. ACKNOWLEDGMENTSI would like to express my deep gratitude to Dr. Akhilesh A.Waoo, Head of the Department, AKS University Satna, andmy research supervisor, for their patient guidance, enthusiasticencouragement, and useful suggestions of my entire researchwork. I would also like to thank Professor Dr. Rakesh KumarKatare, Dr. Navita Shrivastava, APS University Rewa, fortheir advice and assistance in keeping my progress in the rightdirection. This research paper would not have been possiblewithout the exceptional assistant of my fellow Ms. SonaliSingh. My special thanks to the academic and technical staffof AKS University and RGCCAT Satna, for theirencouragement and valuable suggestions.Finally, I wish to thank my parents and brothers fortheir continuous support and encouragement throughout myresearch.[11][12][13][14]VII. REFERENCE[15][1][2][3][4]Calheiros, R.N., Ranjan, R., De Rose, C.A.F., Buyya, R.(2009). CloudSim: A Novel Framework for Model andSimulation of Cloud Computing Infrastructures andServices, (pp. 1-9).Kocher, D., Hilda, A.K.J. (2017). An approach for faultstolerance in cloud computing using machine learningtechnique. Int. J. Pure Appl. Math. 117(22), (pp. 345351).Bekkerman, R., Bilenko, M., and Langford, J. (2011).Scaling up machine learning: Parallel and distributedapproaches. Cambridge University Press.Bernstein, J., Xiang Wang, Y., Azizzadenesheli, K. andAnandkumar, A. (2018). Signsgd: Compressed[16][17][18]613optimization for non-convex problems. In InternationalConference on Machine Learning, (pp. 559-568).Bijral, A. S., Sarwate, Anand D., and Srebro N. (2016).On data dependence in distributed stochasticoptimization. arXiv preprint arXiv:1603.04379.Chaturapruek, S., John, C. D. and C. R e, C. (2015).Asynchronous stochastic convex optimization: the noiseis in the noise and sgd don’t care. In Advances in NeuralInformation Processing Systems, (pp. 1531-1539).Patil, A., Shah, A., Gaikwad, S., Mishra, A.a., Kohli, S.S.,Dhage, S. (2011). Fault Tolerance in Cluster ComputingSystem. In: 2011 Int. Conf. P2P, Parallel, Grid, CloudInternet Comput., (pp. 408-412).Hazan, E. Introduction to online convex optimization.Foundations and Trends in Optimization (2016). 2(3-4):(pp. 157–325).He, K., Zhang X., Ren, S. and Jian S., Deep residuallearning for image recognition. (2016). In Proceedings ofthe IEEE conference on computer vision and patternrecognition, (pp 770-778).Engelmann, C., Vallée, G.R., Naughton, T., Scott, S.L.(2009). Proactive fault tolerance using preemptivemigration. In: Proc. 17th Euromicro Int. Conf. Parallel,Distrib. Network-Based Process. PDP 2009, (pp. 252257).Kakade, S. M., Shwartz, S. S. and Tewari, A. (2012).Regularization techniques for learning with matrices.ournal of Machine Learning Research, 13(Jun):18651890.Shwartz, S. S. and David, S. B. (2014). Understandingmachine learning: From theory to algorithms, CambridgeUniversity press.Zinkevich, M. (2003). Online convex programming andgeneralized infinitesimal gradient ascent. In InternationalConference on Machine Learning, (pp. 928-936).HATCHER, W. G., and YUA, W. (2018). Survey of DeepLearning: Platforms, Applications and EmergingResearch Trends, IEEE Access, May 24.Chen X.W. and Lin X. (2016). Big data deep learning:Challenges and perspectives, IEEE Access, vol. 2, 2014.14. Y. Ding, S. Chen, and J. Xu, Application of deepbelief networks for opcode based, (pp. 514-525)Malware detection, in Proc. Int. Joint Conf. Neural Netw.(IJCNN), (pp. 3901-3908).Yuan, Y. and Jia K. (2015). A distributed anomalydetection method of operation energy consumption usingsmart meter data, in Proc. Int. Conf. Intell. Inf. HidingMultimedia Signal Process. (IIH-MSP), (pp. 310-313).Zhu, D., Jin, H., Y, Y., Wu, D. and C

AKS University, Satna, Madhya Pradesh, India AKS University, Satna, Madhya Pradesh, India Abstract— Large scale distributed systems encompass heterogeneous computational machines, workloads and sub-systems dispersed diversely across the

Related Documents:

August 2, 2021 15 August 2, 2021 16 August 2, 2021 17 August 3, 2021 18 August 4, 2021 19 August 5, 2021 20 August 6, 2021 21 August 9, 2021 22 August 9, 2021 23 August 9, 2021 24 August 10, 2021 25 August 11, 2021 26 August 12, 2021 27 August 13, 2021 28 August 16, 2021 29 August 16, 2021 30 August 16, 2021 31

EU Tracker Questions (GB) Total Well Total Badly DK NET Start of Fieldwork End of Fieldwork 2020 15/12/2020 16/12/2020 40 51 9-11 08/12/2020 09/12/2020 41 47 12-6 02/12/2020 03/12/2020 27 57 15-30 26/11/2020 27/11/2020 28 59 13-31 17/11/2020 18/11/2020 28 60 12-32 11/11/2020 12/11/2020 28 59 12-31 4/11/2020 05/11/2020 30 56 13-26 28/10/2020 29/10/2020 29 60 11-31

Cadillac Escalade, Escalade ESV 2020 2020 Cadillac XT4 2020 2020 Cadillac XT5 2020 2020 Chevrolet Blazer 2019 2020 Chevrolet Express 2018 2021 Chevrolet Silverado 1500 2018 2020 Chevrolet Suburban 2020 2020 Chevrolet Tahoe 2020 2020 Chevrolet Traverse 2020 2020 GMC Acadia 2019 2020 GMC Savana 2018 2021

August 2nd—Shamble "Queen of the Green" August 9th—President's Club (Eclectic Week 1) August 16th—President's Club (Eclectic Week 2) August 23rd—Criss-Cross (1/2 Handicap) August 30th—Stroke Play (HSTP Qualifying) August Play Schedule August Theme — Queen of the Green! P utting prodigies, our next General Meeting and theme day is August

Romans 12: 1-2 Matthew 16: 21-27 DIOCESAN PRAYER CALENDAR Mon August 24: Rev Mussie Keflezghi MCCJ Tues. August 25: Rev. Jerzy Tomon Wed. August 26: Catholic Teachers Thu. August 27: Rev. Issac Tharayil, CMI Fri. August 28: Rev. Nishil Varghese, CMI . August 29: Rev. Ilija Petkovic MASS INTENTIONS August 25 - 30, 2020 Tuesday, 25th - Weekday

Oct 30, 2020 · August 2019 2017 CHEVROLET BOLT Used 18 279,794 August 2019 2017 CHEVROLET VOLT Used 12 191,083 August 2019 2017 FIAT 500E Used 1 12,588 August 2019 2017 FORD FOCUS BEV Used 2 31,703 August 2019 2017 KIA SOUL EV Used 1 15,900 August 2019 2017 NISSAN LEAF Used 101 1,304,259 August 2019

Aug 04, 2020 · Ranch August 1-4. One honor not offered was how to meet Upcoming Events Hispanic Pastor's Meeting Glenwood Springs August 7-9 Hispanic INTEL Class August 10-11 Pastors Meeting August 12-15 RMC Property and Trust Committee August 16, 9:30 a.m. Staff Meeting August 21 GVR Board August 21, 9:

807 Katherine Golf Club YES ONLINE 808 Palmerston G & CC YES ONLINE 809 RAAF Darwin GC YES ONLINE 810 Tennant Creek GC YES ONLINE 811 RAAF Tindal GC YES ONLINE 812 Elliott GC YES ONLINE 20010 National Assoc Left-handed Golfers - NSW YES ONLINE 20011 The Sydney Veteran's Golfers Assoc. YES ONLINE