Hardware And Software Reliability (323-08)

2y ago
26 Views
2 Downloads
1.02 MB
48 Pages
Last View : 30d ago
Last Download : 3m ago
Upload by : Milo Davies
Transcription

Hardware and Software Reliability (323-08)Application and Improvement of Software Reliability ModelsSubmitted By: Dolores WallaceAndCharles Coleman, Manager, SATCOctober 12, 2001Technical POC: Dr. Linda RosenbergAdministrative POC: Dennis BrennanPhone #: 301-286-0087Fax #: 301-286-1701Email: Linda.Rosenberg@gsfc.nasa.govMail Code: 304Phone #: 301-286-6582Fax #: 301-286-1667Email: Dennis.Brennan@gsfc.nasa.govMail Code: 300

ABSTRACTThis report presents the results of Task 323-08, Hardware and Software Reliability. Althoughhardware and software differ, they share a sufficient number of similarities that the mathematicsused in hardware reliability modeling have been applied to software reliability modeling. Thistask examines those models and describes how they may be practical for application to projectsat Goddard Space Flight Center. The task also resulted in improvements to one model to allowfor fault correction.Application and Improvement of SW Reliability Models.docii

EXECUTIVE SUMMARYNASA acquires and uses many systems in which software is a major component, and many ofthese systems are critical to the success of NASA’s mission. These systems must executesuccessfully for a specified time under specified conditions, that is, they must be reliable. Thecapability to provide accurate measurement of the reliability of the software in these systemsbefore NASA accepts them is an essential part of ensuring that NASA software will meet itsmission requirements.The purposes of Task 323-08, Hardware and Software Reliability, are to examine reliabilityengineering in general and its impact on software reliability measurement, to developimprovements to existing software reliability modeling, and to identify the potential usefulnessof this technique as one data point in measuring reliability of software at the Goddard SpaceFlight Center.The first part of this project identified the mathematics and statistical distributions used inreliability modeling and found that essentially all have been applied to software reliabilitymodeling. The study identified major differences between hardware and software and indicatedthat the software reliability models do not specifically accommodate those differences. Thestudy resulted in several recommendations for model modification.The second part of this project explored the use of these software reliability models at GoddardSpace Flight Center (GSFC) and their improvement. A case study to determine usefulness of thistechnique at GSFC used project failure data and characterized and manipulated it for use with asoftware reliability tool. The actual process of software reliability modeling includes thepreparation of the data, selection of the appropriate model, and analysis and interpretation ofresults of the models. A key criterion to practicality is the amount of effort required for eachstep.The Naval Space and Warfare Center (NSWC), Dahlgren, Virginia, has sponsored thedevelopment of a software tool, Software Modeling and Estimation of Reliability Functions forSoftware (SMERFS), under the direction of Dr. William B. Farr. This public domain toolexercises several software reliability models and served as an instrument for assessing usabilityof software reliability modeling at GSFCOne difference between hardware and software is the correction process. By the time hardwareis in operation and reliability studies occur, generally design faults have been removed. Withsoftware, faults are often remaining during system test and operation. The hardware reliabilitymodels do not account for correction during the time of reliability measurement. For thisresearch task, Dr. Norman Schneidewind of the Naval Postgraduate School developedadjustments to the Schneidewind model to allow for fault correction.This report describes the results of these studies.Application and Improvement of SW Reliability Models.dociii

TABLE OF CONTENTSABSTRACT . iiEXECUTIVE SUMMARY . iii1. Introduction .12. Overview of Hardware and Software Reliability .22.12.22.32.4Definitions .3Software reliability models.4Requirements for using the models.6Some Hardware and Software Differences Impacting Reliability Models .83. Applying Software Reliability Modeling at GSFC .93.13.23.33.4The Modeling Process .9Collection of Data.13Software Tool Availability .14Options for applying software reliability modeling at GSFC .164. Modeling the Fault Correction Process .175. Conclusions .186. References.19Appendix A. Performing Software Reliability Modeling .21A.1A.2A.3A.4Initial Process Steps.21Exercising the Models .22Sample Executions.22Lessons Learned .27Appendix B. Improvements to a Software Reliability Model .3B.4B.5Fault Correction Prediction Model Components .29Fault Correction Delay.30Number of Faults Corrected.32Proportion of Faults Corrected.33Number of Remaining Faults .33Time Required To Correct C Faults.33Fault Correction Rate .34Applications.34Predicting Whether Reliability Goals Have Been Achieved.34Stopping Rules for Testing and Prioritizing Tests and Test Resources.35Validation .39Summary.41References .42Table 1. Software Reliability Failure Rate Models .5Table 2. Software Reliability NHPP Models.6Table 3. Assumptions for Software Reliability Models.7Table 4. Data Requirements for Software Reliability Models.8Table 5. Software reliability models in SMERFS 3.15Table A1. Summary of 3 Models for Interval Data of 69 Weeks .24Table B.1. OID (Predictions for T s-1 6) .40Table B.2. OIJ (Predictions for T s-1 8).40Table B.3. OIO (Predictions for T s-1 8) .40Figure A1. Format for TBF Data Input.22Figure A2. Results for Weekly Intervals, Integration Rest, Subsystem 1.23Application and Improvement of SW Reliability Models.dociv

Figure A3. Sample of Output for a Specific Model.24Figure A4. Loglet Lab Results With Monthly Integration Failure Data.25Figure A5. Sample Output for Time Between Failure.26Figure A6. Observed, Estimated Values for Time-Between Failure Models .26Figure 1. Concept of Fault Correction Service .31Figure 2. Distribution Function of Fault Correction Delay .32Figure 3. Predicted Maximum Correction Delay (OIJ) .36Figure 4. Predicted Failures and Corrected Faults (OID) .36Figure 5. Predicted Number of Remaining Faults .36Figure 6. Predicted Proportion of Remaining Faults .38Figure 7. Predicted Time to Correct Faults.38Application and Improvement of SW Reliability Models.docv

1.IntroductionNASA is increasingly dependent upon systems in which software is a major component. Thesesystems are critical to the success of NASA’s mission and must execute successfully for aspecified time under specified conditions, that is, they must be reliable. The capability toaccurately measure the reliability of the software in these systems is an essential part of ensuringthat NASA systems will meet mission requirements.The Software Assurance Technology Center (SATC) at the NASA Goddard Space Flight Center(GSFC) performed Task 323-08, Hardware and Software Reliability to examine reliabilityengineering, its impact on software reliability measurement and the practicality of using it toprovide one data point for measuring the reliability of software at GSFC. Reliability engineeringexecutes various mathematical functions on past failure data to predict future behavior of acomponent or system, that is, to measure the increase in its reliability, usually referred to asreliability growth. This project explored the improvement of software reliability engineeringmodels to accommodate fault correction.The first part of this project identified the mathematics and statistical distributions used inreliability modeling and found that essentially all have been applied to software reliabilitymodeling. The study identified major differences between hardware and software and indicatedthat the software reliability models do not specifically accommodate those differences. Whilethe complete findings were reported previously, Section 2 of this report contains a brief summaryto provide appropriate context for this part of the project.The second part of this project explored issues for using these software reliability models atGSFC and developed improvements to the Schneidewind model. We examined the process ofsoftware reliability modeling identified by the American Institute of Aeronautics andAstronautics in its Recommended Practice for Software Reliability [AIAA]. The purpose was toidentify the difficulties of using software reliability modeling and some steps of the process thatmay be made easier with the aid of a software reliability modeling tool.The mathematical and statistical functions used in software reliability engineering employseveral steps. The equations for the models themselves have parameters that are estimated fromtechniques like least squares or maximum likelihood estimation. Then the models, usuallyequations in some exponential form, must be executed. But verifying the selected model for theparticular data set may require iteration and study of the model functions. From these resultspredictions can be made, and confidence intervals for the predictions can be computed. All ofthese computations are time-consuming and error-prone when computed manually.We searched for a software tool to assist us with software reliability modeling on GSFC projectdata to understand how practical use of this measurement technique can be. One tool thatreduces the difficulty of software reliability modeling is the Software Modeling and Estimationof Reliability Functions for Software (SMERFS), developed under the direction of Dr. WilliamB. Farr of the Naval Surface Warfare Center, Dahlgren, Virginia. It performs curve-fitting,model selection and execution, and statistical analysis for several software reliability models.

The latest version, SMERFS 3, like its predecessors, contains the mathematics for many of thesoftware reliability models. Except for user features, features of SMERFS concerning themodels are likely to be similar to any other software reliability modeling tools. Because both thetool and guidance from Dr. Farr were available to us, we selected this tool to served as aninstrument for assessing the practicality of using software reliability modeling at GSFC.Next we found two GSFC projects with data in a defect tacking system. While failure data areavailable in this tracking system, other information may be needed to characterize the project,select models and organize data correctly. We describe this experience with SMERFS 3 andthese data to show how software reliability modeling works. We identify intellectual aspects thatthe tool cannot perform. We also demonstrate the type of effort needed by the project staff to usesoftware reliability modeling as a successful technique for software reliability measurement. Weshow pitfalls that may entrap those who do not analyze their project characteristics and databefore exercising SMERFS 3 on failure data.One difference between hardware and software is the correction process. By the time hardwareis in operation and reliability data are collected, generally design faults have been removed. Thehardware reliability models do not account for correction during the time of reliabilitymeasurement. With software, faults exist during system test and operation such that reliabilitygrowth occurs as these are corrected Dr. Norman Schneidewind of the Naval PostgraduateSchool developed adjustments to his model to allow for fault correction.Section 2 of this report provides a brief synopsis from the first report of this study of the basicinformation about software reliability modeling. Section 3 describes in detail the softwarereliability modeling process and options for applying the process at GSFC. Section 4 describesthe research results of modifying the Schneidewind model to accommodate differences betweenhardware and software, with Appendix B providing complete information. Section 5 provides theconclusions about potential use of software reliability modeling at GSFC. Appendix A describesa case study using GSFC project data with SMERFS 3.2.Overview of Hardware and Software ReliabilityHardware and software reliability engineering have many concepts with unique terminology andmany mathematical and statistical expressions. Basically, the approach is to apply mathematicsand statistics to model past failure data to predict future behavior of a component or system.Major statistical distributions used in hardware reliability modeling include the exponential,gamma, Weibull, binomial, Poisson, normal, lognormal, Bayes, and Markov distributions. Touse these distributions, data collected from failures of systems need to be fitted with techniqueslike maximum likelihood or least squares estimates. The appropriateness of the models selectedneed to be verified by using statistical methods like Chi-squared or goodness-of-fit. Becausemechanical and electrical systems tend to deteriorate over time, these reliability distributionsdepend on time as the variable, usually calendar time.Application and Improvement of SW Reliability Models.doc2

To provide context for the rest of this report, this section provides definitions, descriptions of afew software reliability models, and assumptions and requirements for using these models. Italso provides a discussion about differences between hardware and software and their impact onmodeling for software reliability.2.1DefinitionsUnless otherwise indicated, these definitions are taken from the Department of Defense MILHDBK-338B on electronic reliability [Mil338].Failure: The event, or inoperable state, in which any item or part of any item does not, or wouldnot, perform as previously specified.Failure: (1) The inability of a system or system component to perform a required function withinspecific limits. A failure may be produced when a fault is encountered and a loss of the expectedservice to the user results. (2) The termination of the ability of functional unit to perform itsrequired function. (3) A departure of program operation from program requirements [AIAA] .Failure intensity function: the instantaneous rate of change of the expected number of failureswith respect to time [Lyu].Failure rate: The total number of failures within an item population, divided by the number oflife units expended by that population, during a particular measurement period under statedconditions.Failure rate: (1) The ratio of the number of failures of a given category or severity to a givenperiod of time; for example failures per second of execution time, failure per month.Synonymous with failure intensity. (2) The ratio of the number of failures to a given unit ofmeasure; for example, failures per unit of time, failures per number of transactions, failures pernumber of computer runs [AIAA].Mean Time Between Failure: A basic measure of reliability for repairable items. The meannumber of life units during which all parts of the item perform within their specified limits,during a particular measurement under stated conditions.Mean Time Between Failure: The expected or observed time between consecutive failures in asystem or component [I982].Mean Time to Failure: A basic measure of reliability for non-repairable items. The total numberof life units of an item population divided by the number of failures within that population,during a particular measurement under stated conditions.Reliability: 1. The duration or probability of failure-free performance under stated conditions. 2.The probability that an item can perform its intended function for a specified interval understated conditions.Application and Improvement of SW Reliability Models.doc3

Reliability: The ability of a system or component to perform its required functions under statedconditions for a specified period of time [I610].Reliability: see Software Reliability [I982].Reliability growth: The improvement in reliability that results when design, material, or partdeficiencies are revealed by testing and eliminated through corrective action.Software reliability: (1) The probability that software will not cause the failure of a system for aspecified time under specified conditions. The probability is a function of the inputs to and use ofthe system, as well as a function of the existence of faults in the software. The inputs to thesystem determine whether existing faults, if any, are encountered [AIAA] [I982]. (2) The abilityof a program to perform a required function under stated conditions for a stated period of time[AIAA].Software reliability model: A mathematical expression that specifies the general form of thesoftware failure process as a function of factors such as fault introduction, fault removal and theoperational environment [AIAA].Time: a fundamental element used in developing the concept of reliability and is used in many ofthe measures of reliability. Determining the applicable interval of time for a specificmeasurement is a prerequisite to accurate measurement. Usually the interval of interest iscalendar time, but may be broken down into other intervals (or calendar time may bereconstructed from other intervals).Wearout: The process that results in an increase of the failure rate or probability of failure as thenumber of life units increases.2.2Software reliability modelsSoftware reliability engineering produces a model of a software system based on its failure datato provide a measurement for software reliability. The mathematical and statistical functionsused in software reliability modeling employ several computational steps. The equations for themodels themselves have parameters that are estimated using techniques like least squares fit ormaximum likelihood estimation. Then the models, usually equations in some exponential form,must be executed. Verifying that the selected model is valid for the particular data set mayrequire iteration and study of the model functions. From these results predictions about thenumber of remaining faults or the time to next failure can be made, and confidence intervals forthe predictions can be computed.A few algorithms of some popular models are shown in Tables 1 and 2 [Pham ]. Table 1identifies software failure rate models used to study the program failure rate per failure at thefailure intervals. Table 2 describes some NHPP models. Software reliability models rely on twotypes of data, either the number of failures per time period or the time between failures. Mostsoftware reliability models are well known and have been used in the 1980s and 1990s.Exponential distributions are used almost exclusively for reliability in the prediction of electronicequipment, that is, the probability distribution function pdf of X: f(x) λe -λt. This distributionApplication and Improvement of SW Reliability Models.doc4

can be chosen as a failure distribution if and only if the assumption of a constant hazard rate canbe justified, that is, the hazard rate λ [Mann]. Models using an exponential distribution includethe Musa Basic, Jelinski-Moranda De-eutrophication, Non-homogeneous Poisson process(NHPP), Goel-Okumoto, Schneidewind, and hyperexponential models. For these, memory is notimportant, that is, failures in the past have no impact on future failure rates.The Weibull model is a general form of the gamma distribution, lognormal, exponential, ornormal, depending on the value of β. Variations include the S-shaped reliability growth modeland the Rayleigh model. Musa uses the logarithmic model and assumes errors contributedifferently to the error rate. Shooman uses the binomial distribution and Littlewood-Verrall usesBayesian statistics. For Bayesian models, memory is important, that is, what has failed beforehas an impact on current and future failure rates ometricGoelOkumotoLittlewoodVerrallTable 1.f(ti)Software Reliability Failure Rate ModelsR(tI) 1 – F(ti)λ(ti)φ[N (i 1)]exp( φ(N (i 1))ti)Exp( φ(N i 1)ti)φ[N (i 1)]φ[N (i 1)]tiexp( (φ[N (i 1)]ti2)/2)Dki 1exp( Dki 1ti)Exp( (φ[N (i 1)]ti2)/2)φ[N (i 1)]tiExp( Dki 1ti)Exp( Dki 1)φ[N p(i 1)]exp( φ[N p(i 1)]ti)Exp( φ[N p(i 1)]ti)φ[N p(i 1)]α[ξ(i)/(ti ξ(i)]α[1/(ti ξ(i))] α/(ti ξ(i)) {α[ξ(i)/(s ξ(i)]αt[1/(s ξ(i))]}dsiShoomanNANAWeibull(β(t γ)β 1/θβ)exp( ((t γ)/θ)β)Exp( ((t γ)/θ)β)tiω [from 0 to ti β3(s)ds]β1β2(β(t γ)β 1/θβ)Whereα a parameter to be estimated;φ a proportional constant, the contribution any one fault makes to the overall program;N the number of initial faults in the program;ti the time between the (i 1) th and the i th failures;D initial program failure rate;k parameter of geometric function (0 k 1);p the probability of removing a failure when it occurs;ξ(i) β0 β1i or β0 β1i2;ω the inherent fault density of the program;β1 the proportion of unique instructions processed;β2 a bulk constant;β3(t) the faults corrected per instruction per unit time.Application and Improvement of SW Reliability Models.doc5

Table 2.Model nameSoftware Reliability NHPP ModelsModelMVF(m(t))typeGoel-Okumoto (G-O)ConcaveSchneidewindConcaveDuane GrowthConcave orS-shapedDelayed S-shaped(Yamada)S-shapedInflection S-shaped(Ohba)ConcaveMusa BasicMusa LogConcaveConcavem(t) a(1 exp( bt))a(t) ab(t) bm(t) (a/b)(1 exp( bt))a(t) ab(t) bm(t) atba(t) ab(t) bm(t) a(1 (1 bt)exp( bt))a(t) ab(t) b2t/(1 bt)m(t) a(1 exp( bt))/(1 βexp( bt))a(t) ab(t) b/(1 βexp( bt))m(t) β0(1 exp( β1t))m(t) a(1 exp( ct/nT))a k/Σik 1(1 exp( cti/nT))c (1/knT)Σik 1ti (a/knT)Σik 1ti exp( cti/nT)whereMVF the mean value function;m(t) expected number of errors detected by time t (“mean value function”);a(t) error content function, i.e., total number of errors in the software including theintroduced errors at time t;b(t) error detection rate per error at time t;ti the observed time between the (i 1) th and the i th failure;a number of failures in the program;c the testing compression factor;T mean time to failure at the beginning of the test;n total number of failures possible during the maintained life of the program.2.3Requirements for using the modelsCertain assumptions should be true for the models to produce valid results; these are provided forseveral models in Table 3 [Lyu]. Musa and several other models share the following group ofassumptions, referred to as basic assumptions in Table 3:1.2.3.Software is operated in a similar manner as that in which reliability predictions are to bemade.Every fault has same chance of being encountered within a severity class as any otherfault in that class.The failures when faults are detected are independent.Application and Improvement of SW Reliability Models.doc6

Model nameMusa BasicExponentialApplied afterintegrationMusa LogAppliedduringunit to system testTable 3. Assumptions for software reliability modelsAssumptions1.2.Basic assumptionsCumulative number failures by time t, M(t), follows Poisson process with mean valuefunction µ(t) β0 [1-exp-(β1t)] where β0, β1 0. µ(t) such that the expected number offailure occurrences for any time period is proportional to the expected number undetectedfaults at that time. Finite failure model.3. Time between failure expressed in cpu time4. Errors contribute equally to error rate5. Finite number of inherent errors6. Error rate declines uniformly for every error corrected constant error rate over time7. Mean value function: expected number failure occurrences at anytime proportional tonumber undetected faults at that time8. Execution times between failures are piecewise exponentially distributed (hazard rate fora single fault is constant)9. Quantities of resources number fault ids, correction personnel, computer times) availableare constant over segment for which the software is observed10. Resource expenditures for the kth resource can be approximated (complicated)11. Fault correction personnel utilization establis

Reliability engineering executes various mathematical functions on past failure data to predict future behavior of a component or system, that is, to measure the increase in its reliability, usually referred to as reliability growth. This project explored the improvement of software reliability eng

Related Documents:

R 323.1151 to R 323.1159, R 323.1162 to R 323.1164, and R 323.1169 of the Code are rescinded to read as follows: PART 5. SPILLAGE OF OIL AND POLLUTING MATERIALS . R 323.1158 Rescinded . R 323.1151 Rescinded R 323.1159 Rescinded R 323.1152 Rescinded R 323.1162 Rescinded R 323.1153 Rescinded R 323

USP Health and Wellness Centre ( 679) 323 2362 / 323 2202 USP Security Main Gate ( 679) 323 2211 ext 122 Pedestrian Gate ( 679) 323 2211 ext 121 Back Gate ( 679) 323 2211ext 120 Control Room (after hours) ( 679) 323 2211/ 323 2840 EMALUS CAMPUS, VANUATU Hospital ( 678) 22100 Ambulance - Hospital 112 Emergency - Promedical Ambulance 115

NiIr/Cr2O3 323 0.5 6.0 247.9a S7 Ni0.6Pt0.4/MSC-30 303 0.6 6.0 240a S8 Ni0.9Pt0.1/graphene 323 0.5 6.0 240b S9 Ni0.9Pt0.1-CeO2 323 0.5 5.74 234b S10 Cu0.4Ni0.6Mo 323 2.0 6.0 108b S11 Ni@(RhNi-alloy)/Al2O3 323 without 5.74 72.0a S12 Ni5@Pt 323 without 4.4 2.3a S13 aThe total TOF values were

Test-Retest Reliability Alternate Form Reliability Criterion-Referenced Reliability Inter-rater reliability 4. Reliability of Composite Scores Reliability of Sum of Scores Reliability of Difference Scores Reliability

1 School Phone Number: 754 323 6250 Fax Number: 754 323 6290 Attendance Line: 754 323 6252 Keep your contact info up to date in the front office to receive

Reliability Infrastructure: Supply Chain Mgmt. and Assessment Design for reliability: Virtual Qualification Software Design Tools Test & Qualification for reliability: Accelerated Stress Tests Quality Assurance System level Reliability Forecasting: FMEA/FMECA Reliability aggregation Manufacturing for reliability: Process design Process variability

- HARDWARE USER MANUAL - MANUEL DE L'UTILISATEUR HARDWARE . - HARDWAREHANDLEIDING - MANUALE D'USO HARDWARE - MANUAL DEL USUARIO DEL HARDWARE - MANUAL DO UTILIZADOR DO HARDWARE . - 取扱説明書 - 硬件用户手册. 1/18 Compatible: PC Hardware User Manual . 2/18 U.S. Air Force A -10C attack aircraft HOTAS (**) (Hands On Throttle And .

approaches to the subject of reconstruction in archaeological illustration, and other media i.e. films. Method(s) of teaching: The module comprises: A weekly practical session (2hrs) where tutor will introduce different materials and recorded graphic data to be drawn. 1 x lecture Visit and meet with archaeological illustrators and graphic designers in their workplace Method of assessment: The .