Root Cause Analysis - Skendric

1y ago
9 Views
1 Downloads
1.88 MB
93 Pages
Last View : Today
Last Download : 3m ago
Upload by : Brenna Zink
Transcription

Root Cause Analysis – BeginnerA Hands-on TutorialYour Pre-Flight Check List1. Write your first name on the card stock, display prominently2. Locate the courseware on the USB stick3. Grab the latest version of the slide deck, dated -Cause-Analysis-Beginner-Deck.pdf4. Configure Wireshark columns (see p.5 of this presentation)5. Introduce yourself to your potential teammates: figure out who will play which roles6. Examine the diagrams on the wallsCopyright Stuart Kendrick 2013 All Rights Reserved2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick1

Workshop OutlineIntroductionExample CaseSplit into Small GroupsCase StudiesRemote Office BumpsMany Applications CrashTips & ToolsWrap-up2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick2

IntroductionMechanicsMe and My BiasesWhat is Root Cause Analysis?How Does This Class Work?Recommendations2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick3

Mechanics9:00 – 10:30 ClassAsk questions whenever you want10:30 – 11:00 Break11:00 – 12:30 Class12:30 – 13:30 LunchYour Laptop13:30 – 15:00 Class has Internet connectivity15:00 – 15:30 Break can display & search PDF, PNG, TXT, XLS15:30 – 16:30 Class Wireshark configured per next slide16:30 – 17:00 Wrap-upWe use Google Docs you don’t need an account: I will provide links2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick4

Use a recent version of Wireshark 1.10.0 at a minimum – I recommend the latest and greatest If you are an experienced Wireshark user, feel free to ignore this and use your favorite column choices If you are really experienced and prefer a different analyzer, feel free to use it2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart KendrickAnd Custom (tcp.stream) will be helpfulYou really want Delta time displayedConfigure Wireshark Columns5

MeMulti-disciplinary IT trouble-shooter / Root Cause cpvax5 (Science Applications d.cornell.eduskendric@fhcrc.orgstuart.kendrick {at} isi lon dot comstudentprogrammerdesktop / serverserver / networkmultidisciplinarysustaining engineer198119841985199119932014IT Architect ITIL Problem Manager Problem Analyst Device Monitoring TransportGeeky HighlightsPL/1 on IBM mainframesFORTRAN on CRAY-1Terak, DisplayWriter, IBM PC, MacintoshNetware, Corvus Omninet, TCP-IP / IPX / AppleTalkAppleShare, QuickMail, Farallon, NRC, Cisco, SniffersSolaris, Windows, Linux, Perl, SNMP, Wireshark, Cisco ,FlukeOneFS2014-04-12Cornell UniversitySAICCornell UniversityCornell UniversityCornell Medical CollegeFHCRCEMC IsilonMyth-Busting xxx 2014 Stuart Kendrick / Chris ShaimanIthacaSan 8519881991199320136

YouYou are a mid-level engineerPerhaps you function as a sys admin, network engineer, database admin, or developerPerhaps you support desktops and want to expand into another spacePerhaps you work for a small outfit and are a jack-of-all-tradesYou look at logs regularly when tackling a problem, perhaps you’ve even looked at packettraces, though without nearly as much success as you would like. You’re curious about howthings work and you’re persistent: you beat your head against a problem, trying to solve itfrom various angles.You are here because you want a chance to tackle problems on your own and then receivecoaching on techniques for analyzing packet traces, extracting insights from performancecharts, correlating log entries from multiple devices.Or perhaps you are a people or process person – resource manager, project manager, ITILProblem Manager. You don’t have the skills to analyze bits & bytes, but you want to practice aproblem solving methodology. You’ll help keep your team on track, coordinating subjectmatter experts, bringing the results together for reports to the larger class.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick7

Caveats I do not claim to be good at trouble-shooting I do not claim to know how to teach trouble-shooting I am not the smartest or fastest guy on the blockHowever I have 30 years experience in this businessI have trained under gurusI have accumulated a grab bag of tips which you may find usefulI have converted real-world events into these case studiesThe result is a set of puzzle-solving labs which I predict you’ll enjoyAfter all, it is more fun to trouble-shoot someone else’s issues 2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick8

My World ViewI have made a ceaseless effort not to ridicule, not to bewail, not toscorn human actions, but to understand them.--Baruch SpinozaAnything worth doing is worth doing badly.--Marshall RosenbergThe first principle is that you must not fool yourself -- and you are theeasiest person to fool.--Richard FeynmanDoubt is uncomfortable; certainty is absurd.--VoltaireThe goal of education is to make up for the shortcomings in ourinstinctive ways of thinking about the physical and social world.--Steven Pinker2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick9

Confidence & KnowledgeCertaintyConfidenceIgnorance more frequently begets confidence than does knowledge. --Charles DarwinNewbieJediDoubtLittle2013-11-05KnowledgeRoot Cause Analysis Beginner LISA 2013 Stuart KendrickLots10

Music to My EarsAs I age, I increasingly value the following from myself and mycolleagues: I don’t know I made a mistake Here’s how I will clean up the mess I madeI predict that you will follow many blind avenues during RCAs I wishyou success in keeping shoshin, aka, beginner’s mind, as you wanderalong your path 2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick11

My BiasesScience is not truth; it is, instead, a method for diminishing ignorance.--J.M. Adovasio, Olga Soffer, Jake PageA scientific theory accurately describes a large class of observations,makes definite predictions about future observations that could befalsifiable, i.e. disproven by observation.--Derived from Stephen HawkingCredible explanations grow from the combined testimony of threemore or less independent, mutually reinforcing sources -- explanatorytheory, empirical evidence, and rejection of competing alternativeexplanations.--Edward TufteI recommend Tufte’s day-long seminar, as an introduction to critical thinking --sk2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick12

Quantum Mechanicshttp://xkcd.com/1240/2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick13

What is Root Cause Analysis?Any structured approach for identifying the contributors to an ITservice disruptionThere is no such thing as a Root Cause nevertheless, Root CauseAnalysis remains a useful toolRCA is not complete until we’ve applied the fix and verified that theproblem is resolvedBusiness reality: competing priorities distract us from completing RCAsMost folks use the term RCA to refer to a post-mortem process I usethe term in its ITIL sense, tightly bound to Problem ManagementHow Complex Systems Fail – Richard CookA Few Thoughts on Uptime – me2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick15

Why No Root Cause?Why do I claim there is no such thing as a Root Cause? Consider the server which goes down; your monitoringsystem pages you; you investigate. Turns out the power supply died – you replace the power supply, the serverreboots, everyone is happy again. Then, you notice that the second power supply is dead, too. Turns out yourmonitoring system wasn’t checking power supplies when the first one fried a few months ago. Why wasn’t yourmonitoring system checking power supplies? Because it can’t – and upgrading to the newer version which cancosts time & money – your management looked at the costs, weighed the risks, and decided to spend your timeand those dollars on upgrading the aging e-mail server, which was close to collapse. Why doesn’t yourdepartment have enough staff and money to upgrade both the e-mail server and the monitoring server? Becausemanagement has to juggle the costs of IT against the costs of core business requirements – both of which lookcritical from different vantage points.So what’s the Root Cause? A failed power supply? An inadequate monitoring system? Insufficient process inyour leadership’s prioritization tactics, that they let the aging e-mail system stumble along for far too long?Insufficient resources to meet both core business requirements and IT requirements? Not enough market foryour product, which is why you don’t have sufficient resources to meet both sets of needs?Still not convinced? Why have you lost two power supplies across as many months? Because your local utility isstraining to meet demand in your area and frequently inflicts brownouts, which age power supplies prematurely.Why hasn’t the utility beefed up capacity in your area? Because that would cost money, and politicians arereluctant to approve the rate increases necessary to support an expansion, given current voter sentiment. Whyare voters annoyed at politicians? Reality is complex: There is no such thing as Root Cause 2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick16

How Do Techs Fix Issues?Oh boy, that’s a big question. But let’s take a stab at answering it. A tech might start asking themselves, or theperson reporting the problem, questions similar to the following: What makes you think there is an issue?What are you expecting that you’re not getting?Has it ever performed well?What changed recently? Software or hardware? Load?Can it be expressed in terms of latency or run time?Does the problem affect other people or applications?What is the environment? What software and hardware is used? Versions? Configuration? Most issues get fixed somewhere during the process of asking these questions and uncovering theanswers 2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick17

Anti-PatternsAs the issue resists resolution, less skilled techs will start employing less effective approaches.Street Lamp MethodThe student comes across his professor on the Arts Quad at night, down on his hands & knees, staring atthe sidewalk. “What are you doing, sir?” “Looking for my car keys”. The student joins the professor butafter looking unsuccessfully in widening circles, asks him “Do you recall precisely where you were when youdropped the keys?” “Yes, over there, in the middle of the quad” points the professor, toward the dimlyperceived middle of the grassy acre. “Well, why are you looking here?” asks the student. “Because thelight is better here” responds the professor.More formally:1. List available tools2. Examine the output of each one, looking for clues3. Purchase more tools4. Goto #1Use The Force, Luke“I know that we are experiencing a broadcast storm you should check your {switch router firewall server client application whatever-belongs-to-some-other-group}”I enjoyed Star Wars but it was fiction that distinction is hard for human brains to make. --sk2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick18

The USE MethodThe issue typically gets escalated to a more experienced tech. I have yet to be satisfied with an account of whatan experienced human does when engaging on their field of expertise. That said, here is one way to express whatmight be happening.For every Resource, check Utilization, Saturation and Errors.Intended to be used early in a performance investigation, to identify systemic bottlenecks.Terminology definitions: Resourceall physical server functional components (CPUs, disks, busses, ) Utilization the average time that the resource was busy servicing work Saturation the degree to which the resource has extra work which it can’t service, often queued Errorthe count of error eventsStuart’s version:1. Scan the logs, looking for error messages2. Are requests waiting in queues?3. How busy are the boxes?ErrorsSaturationUtilizationI am cribbing from Brendan Gregg: -method2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick19

But Not TodayMost problems get solved using any number of techniques, a few of which I sketched in theprevious slidesBut that’s not what I will be pushing you to do todayI will be pushing you to employ a methodology called Rapid Problem Resolution (RPR) RPR is an evidence-based process it is a heavy process it is a sledgehammer.Sledgehammers are generally overkill But for a certain class of problems – the ones which have defeated experienced techs forweeks, months, or years – sledgehammers offer plenty of valueThe case studies in this class belong to that class of problemsI will push you to employ RPR. You may resist. That’s OKThe official goal of this class is to introduce you to RPR2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick20

Rapid Problem Resolution This workshop borrows heavily from the Rapid Problem Resolution methodology codified by Paul Offord of Advance7, which fits into ITIL’sProblem Management schema.I’ve slashed Advance7’s 19 step approach into 9 steps. This makes themethodology less effective but teachable in a single day. And suitablefor smaller RCAs.RPR is not a silver bullet. It is merely a tool for your tool bag, like ping,top, PerfMon There are no silver bullets.Life is pain, Highness. Anyone who says differently is selling something.--The Man in Black2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick21

RCA MethodologyDerived from the Rapid Problem Resolution methodology1.2.3.4.5.6.7.8.9.2013-11-05Understand the SymptomsPick One SymptomDraw the DiagramDesign Capture PlanCapture Diagnostic DataAnalyze Captured DataIdentify FixImplement FixVerify FixRoot Cause Analysis Beginner LISA 2013 Stuart KendrickPhase 1Phase 2Phase 322

Notes on the Nine Steps1. Humans want instant gratification: we start trouble-shooting before we understand theproblem. Resist that urge.2. Natural desire to want to fix everything fast – myself, I rarely succeed when I try. Beparticularly wary of thrashing: jumping from one symptom to another. Pick One Symptom,One Symptom only, and stick to it.3. Common to start trouble-shooting before understanding the environment. Draw the Diagramand Sit with the User. You may discover that you didn’t understand the Symptom, in whichcase, start over.4. As you learn more about the Environment and make mistakes in your capture methodology,you’ll cycle through Steps #4-6 numerous times. This is normal. As you become moreexperienced, you’ll spend more time on #3 and fewer time s cycling through #4-#6.5. If the problem is intermittent, you can spend a lot of time waiting here. That is reality.6. Naturally, you need time to think about the data you capture.7. At some point, you exit the #4-#6 loop because you think you understand what is happeningand you have identified a fix.8. You apply the fix9. Key step: verify that your fix actually works. If it doesn’t, start over.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick23

RCA Roles & ResponsibilitiesWhoWhat Facilitator(often a Problem Manager) Problem Analyst(often a senior engineer) Subject Matter Experts SME Desirable CharacteristicsAccountable foro Owns the RCAo Acquire resourceso Use and execute the methodologyo Communicate within the teamo Report & escalate to leadershipo Schedule meetingsProcess-oriented personResponsible foro Unify & synthesize information from SMEso Keep team on track technicallyo Breadth & depthSees the forest, not the treesRespected / trusted by SMEsResponsible foro Strong fundamental knowledge of areao Facilitating accesso Capturing datao AnalyzingLike getting their hands dirtySkills / Predilectionso Problem solving skillso Inquiring mind – passion for understanding how things worko Determination & stamina – pursuing a tough problem can be wearingo T-shaped – broad background in IT with specialization in one or two particular areasThe Problem Solving Group (aka RCA Team) consists of the Facilitator, the Problem Analyst, andone or more Subject Matter Experts

Draw the DiagramDesign Capture PlanRequestWho talks to whom?Where to insert probes?Where to gather logs / debug output?Response(DNS, LDAP, NIS )2013-11-05Fibre Channel SwitchRoot Cause Analysis Beginner LISA 2013 Stuart Kendrick25

How Does This Class Work?We will work through case studies – real situations drawn from myexperience at FHCRC – alternating between small group and seminarstyle sessions.Typically, we will oscillate in 15-30 minute increments – spending 1530 minutes together as a class, working privately in our small groupsfor 15-30 minutes, coming together for 15-30 minutes Course materials on the USB stick include packet traces, log extracts,trending charts, ‘show’ output from clients, servers, switches/routers,storage systems, captured during the actual RCA.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick26

ExpectationsWhirlwind tour: At the Hutch, we typically spent weeks of an RCAteam’s time on these cases – in this workshop, we will just tasteeach experience, merely touching on key points – we will nothave time to dig through any of them in detail.Variable expertise: As a group, we differ wildly in our expertise –some of us have never seen Wireshark before, have nevertouched an Ethernet switch or a storage array. I will play to arange of levels: sometimes you may be bored, sometimes youmay be drowning.We will not finish: I do not expect to reach all the case studies. Wemay not even get through the first one – it contains a lot ofmaterial – all depends on where your curiousity leads us.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick27

More ExpectationsDetours: Using your questions as cues, I will stop the flow of thecourse and explore related topics: how striping affects theperformance of arrays, how TCP Window works, how to performa particular function in Wireshark.Contribute: If you have expertise to contribute, please speak up –group dialogue contributes to learning.Methodology: I will be a stickler for the RPR Methodology and willattempt to push you into following it, following each step in order.Naturally, you may choose to resist. I’m OK with dissent andrebellion – you know yourself better than I do – if you’ll learnbetter doing things differently, ignore me blaze your own trail.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick28

Great ExpectationsRed Herrings: I will include data and clues which are irrelevant tosolving the problem that’s what happened to us, so I intend toshare the pain. Misinformation: When I am wearing a hat, I may give you inaccurateinformation, based on the limitations of the person whose role Iam playing. When I am bare-headed, I am playing the role of theinstructor and will try to describe reality as accurately as I knowhow.Chaos: I am trying to recreate the fog of war, the confusion of a realworld situation: practicing ways to bring order from chaos is adeep lesson of this class2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick29

RecommendationsEmbarrass me: I make mistakes – find them and point them out. I’d rather feelembarrassed and learn than feel comfortable and remain ignorant.Embarrass yourself: Take risks, ask dumb questions, reveal your ignorance. If youdon’t understand my answer, ask again. This is your laboratory, a safe place foryou to learn. Ex ignoratia ad sapientium, E luce ad tenebras.Data: The USB stick contains data – packet traces, ‘show’ output, screen shots – asyou work through the scenario and ask for data, I will point you to the relevantdirectory. If you get stuck, feel free to poke around.Results Folders: The USB stick also contains the answers to the case studies infolders named Results. I recommend avoiding the Results folder until we’redone for the day.Wave me down: If you are stuck and thrashing, wave me down – I’m happy toassess where you are and offer you direction to get you unstuck2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick30

Questions?We are about to walk through the Example Case.Questions up to this point?2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick31

Example Case1. Understand the Problem2. Choose One Symptom3. Draw the Diagram4. Design Capture Plan5. Capture Diagnostic Data6. Analyze Captured Data7. Identify Fix8. Implement Fix9. Verify FixResults2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick32

Walk Through an Example CaseServer Disconnects Telnet ClientThe End-User (Angie) keeps getting disconnected from the Server(Daffy). This has been going on for a while; Angie has a high-profilejob and a high-profile boss; management has spun up a Root CauseAnalysis team and assigned you and a Desktop Tech (Bob) to theteam. Bob explains to you that he has been working the issue forseveral weeks, that a Router is causing the problem, and that heneeds help finding and fixing the Router.We start with 15 minutes together focused on Methodology Step #1:Understand the Symptoms2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick33

#1 Understand the SymptomsQuestions for the Desktop TechYou:Bob:What do you know about Angie?She is a power user located in the Fairview Building, runsWindows XP and the Attachmate Reflection terminalemulator.You:Bob:What do you know about the Server?It is a Unix server called Daffy located in the Yale data centerand run by the Sys Admin Rick.You:Bob:How long has the problem been occurring?Several weeks, happens multiple times per day, no pattern.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick34

#1 Understand the SymptomsQuestions for End-UserYou: When did this start?Angie: It has happened for years, but I didn’t bother to report itbecause, until several weeks ago, I hardly used Daffy. Now, Ispend all day in it, and the problem is really annoying.You: What do you notice?Angie: Multiple times per day, I get disconnected and have to logback in.You: See any patterns?Angie: Not really. Sometimes I’m typing along and get disconnected.Sometimes, I turn back to my machine or unhide Reflectionand see that I’ve been disconnected.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick35

#1 Understand the SymptomsQuestions for End-UserYou: What do you do with this application?Angie: I enter data into the FALCON database. The forms from whichI acquire the data are irregular – requires a lot ofinterpretation. Sometimes, I spend time looking up relatedcases in other databases or calling relevant people on thephone for input. Sometimes, I just type like a mad woman.Sometimes, I run reports – it’s really annoying when a reporttakes half an hour to run and I get disconnected just before itfinishes, because then I have to re-run the report.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick36

#1 Understand the SymptomsQuestions for End-UserYou:When you’re typing like a mad woman, how long before youget disconnected?Angie: I figure I get 45 minutes. That’s my guess – I figure I getdisconnected every 45 minutes. I might be wrong about that– I haven’t timed it or anything. But if I’ve been logged in forhalf an hour or so and need to run a report, I generally waituntil I get disconnected, log back in, and then run the reportimmediately.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick37

#1 Understand the SymptomsQuestions for the Sys AdminYou:Rick:What can you tell me about Angie’s problem?Got me. It can’t be my server: Daffy has about 40 users and10 developers, and Angie is the only person reporting thisproblem. They all use the Reflection SSH client.You:Rick:What can you tell me about Daffy?It is an HP Alpha server running OpenVMS located here in theD5 data center. It runs the Ingres database manager. Angieuses the FALCON database: everyone uses FALCON; it’s themost popular database we offer.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick38

#1 Understand the SymptomsQuestions for the Sys AdminYou:Rick:How often does Angie have this problem?Seems to me that Angie gets disconnected every hour or two;I’ve checked the server configuration – I haven’t configured atimeout: everyone gets unlimited access as long as theywant.You:Rick:What do your logs say?Not much. Angie has called me plenty of times, right aftergetting disconnected, but all the Alpha logs say is:“Username angie: Client disconnected”2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick39

#2 Pick One SymptomSplit: If this were a real case, we would split into our small groups.You have 15 minutes.Choose: Your first task in small group is to select one and only onesymptom on which to focus. In this example, it’s pretty easy –there’s only one symptom. In future cases, this task will beharder – there will be many symptoms. Generally, I recommendpicking either the easiest to analyze, the easiest to replicate, orthe most costly to the business.Phrase: Find a precise way to phrase the symptom. Example:Angie gets intermittently disconnected from Daffy.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick40

#3 Draw the DiagramThis will involve asking IT staff technical questions about the environment – this iswhere I start swapping hats (End-User, HelpDesk, Desktop, Sys Admin, Network,Database, Security, Vendor, Manager ), depending on the group to which youaddress the questionIdeally, the Ops staff already have this diagram and keep it updated as they makechanges but in my experience, only the most mature shops manage thisSometimes, we identify the cause during the process of diagramming!There’s a lot of experience & judgment here – what to include, what not to includeFocus on the components which surround the Symptom you have picked and howthey relate to one another: dependencies.If you solve a problem without drawing a diagram, you got lucky.2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick41

Diagram for Example Case2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick42

#4 Design Capture PlanThis is done in small group; you have 15 minutes. In this step, youfigure out how you’ll gather the data you identified in the previousstep.Typically, you will want to gather logs and/or metrics fromapplications and operating systems as well as insert sniffersAs much as possible, I will also support your performing ‘show’commands, grepping through logs, trending parameters across time,rebooting devices 2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick43

#4 Design Capture PlanExample Data Capture Plan1.Plug sniffers into lf-esx and d5sr-esx and SPAN Angie’s port and Daffy’s port,filtering on Angie’s IP address2.Enable debug tracing on Angie’s copy of Reflection, gather both syslog andIngres logs on Daffy3.Validate capture set-up by asking Angie to ssh into Daffy, then verifying thatwe can see Angie’s login in all logs and packet traces4.Sit with Angie and watch her work for a day, precisely recording the timeswhen she gets disconnected5.While we’re waiting: Gather ‘show port’ output from Angie’s and Daffy’sswitch ports plus version and configuration information (idle timer setting)from Reflection2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick44

#5 Capture Diagnostic DataThis is done as a class. The instructor executes each group’sDiagnostic Capture Plan and returns the resulting information.Each group benefits from hearing the results of every group’sDiagnostic Capture Plan.Typically 15 minutes.In this example, the instructor returns:Reflection debug tracePacket CapturesLogsAngie & Daffy’ Ethernet port statisticsReflection Version & Settings (idle timer)2013-11-05Root Cause Analysis Beginner LISA 2013 Stuart Kendrick45

If the error counters were high, perhaps we havea bad NIC cable switch port but they arezero or close enough. Rule out bad physical layerAngie’s Ethernet Port Stats2013-11-05lf-esx#sh ver[ ]lf-esx uptime is 3 years, 3 weeks, 5 days, 12 hours, 44 minutes[ ]lf-esx#sh int Fa2/19FastEthernet2/19 is up, line protocol is up (connected)Hardware is Fast Ethernet Port, address is 0011.21f5.46c2 (bia 0011.21f5.46c2)MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,reliability 255/255, txload 1/255, rxload 1/255Encapsulation ARPA, loopback not setKeepalive set (10 sec)Full-duplex, 100Mb/s, link type is auto, media type is 10/100BaseTXinput flow-control is unsupported output flow-control is unsupportedARP type: ARPA, ARP Timeout 04:00:00Last input 00:00:19, output never, output hang neverLast clearing of "show interface" counters neverInput queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0Queueing strategy: fifoOutput queue: 0/40 (size/max)5 minute input rate 0 bits/sec, 0 packets/sec5 minute output rate 4000 bits/sec, 6 packets/sec161282073 packets input, 48475519613 bytes, 0 no bufferReceived 2004674 broadcasts (1689326 multicasts)0 runts, 0 giants, 0 throttles0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored0 input packets with dribble condition detected831253443 packets output, 116132425387 bytes, 0 underruns0 output errors, 0 collisions, 0 interface resets0 babbles, 0 late collision, 0 deferred1 lost carrier, 0 no carrier0 output buffer failures, 0 output buffers swapped outlf-esx#Root Cause Analysis Beginner LISA 2013 Stuart Kendrick46

2013-11-05Angie abruptly hangs up (TCP RST) on Daffy (aka Ingress).Looks like Angie initiated the disconnectPacket TraceRoot Cause Analysis Beginner LISA 2013 Stuart Kendrickdaffy ingress47

Reading the manual tells us that ConnectionSetting

2013-11-05 Root Cause Analysis Beginner LISA 2013 Stuart Kendrick 16 Why No Root Cause? Why do I claim there is no such thing as a Root Cause? Consider the server which goes down; your monitoring system pages you; you investigate. Turns out the power supply died -you replace the power supply, the server reboots, everyone is happy again.

Related Documents:

USING SAP ROOT CAUSE ANALYSIS & SYSTEM MONITORING FOR SYBASE UNWIRED PLATFORM 6 2. ROOT CAUSE ANALYSIS FOR SUP IN SOLUTION MANAGER After SMD Managed System Setup and Configuration, the Root Cause Analysis features of SAP Solution Manager Diagnostics are available in the Root Cause Analysis work center of SAP Solution Manager. Find further information about End-to-End Root Cause Analysis on SAP .

WHAT IS ROOT CAUSE ANALYSIS? 2 Root cause analysis (RCA), is a structural step by step technique that focuses on finding the real cause of a problem and deals with it. Root Cause Analysis is a procedure for ascertaining and analyzing the cause of problems, to determine how these problems can be solved or be prevented from occurring. 8.6.2014

"Fishbone" Diagram: Measures Top Primary Root-Cause Primary Root-Cause Second level Root-Cause Third level Root-Cause Fourth level Root-Cause Measures Education & Training To Recognize Fatigue Failure Of IRS Fatigue Management Systems Political Will Regulation & Policy Under-Reporting Hours Of Service (HOS) Recording Device

ROOT CAUSE ANALYSIS AND ACTION PLAN FRAMEWORK TEMPLATE The Joint Commission Root Cause Analysis and Action Plan tool has 24 analysis questions. The following framework is intended to provide a template for answering the analysis questions and aid organizing the steps in a root cause analysis

ROOT CAUSE ANALYSIS GUIDANCE DOCUMENT. 1. SUMMARY. This document is a guide for root cause analysis specified by DOE Order 5000.3A, "Occurrence Reporting and Processing of Operations Information."Causal factors identify program control deficiencies and guide early corrective actions.As such, root cause analysis is central to DOE Order 5000.3A.

The Problem with Root Cause Analysis Method A Method B Method C Method G Method E Method H Method J Method F Method D Method I No‐one can agree on "what is a root cause." Everyone says they do "root cause analysis,"yet everyone is doing something different!

It can be used on its own or in conjunction with the fishbone diagram analysis in moving from the chosen root cause to the true root cause. Simply ask Why 5 times starting with the effect of the problem. 5 Whys focuses the investigation toward true root cause

Fe, asam folat dan vitamin B 12). Dosis plasebo yaitu laktosa 1 mg (berdasarkan atas laktosa 1 mg tidak mengandung zat gizi apapun sehingga tidak memengaruhi asupan pada kelompok kontrol), Fe 60 mg dan asam folat 0,25 mg (berdasarkan kandungan Fero Sulfat), vitamin vitamin B 12 0,72 µg berdasarkan atas kekurangan