## AbstractDemand for quality software has undergone with rapid growth during the last few years. This is leading to an increase in the development of metrics for measuring the properties of software such as coupling, cohesion or inheritance that can be used in early quality assessments. Quality models that explore the relationship between these properties and quality attributes such as fault proneness, maintainability, effort or productivity are needed to use these metrics effectively. The goal of this work is to empirically explore the relationship between object-oriented design metrics and fault proneness of object-oriented system classes. The study used data collected from Java applications is containing 136 classes. We use a set of twenty-six design metrics in our work. Result of this study shows that many metrics are based on comparable ideas and provide redundant information. It is shown that by using a subset of metrics in the prediction models can be built to identify the faulty classes. The proposed model predicts faulty classes with more than 80% accuracy.
## 1 INTRODUCTIONThere are several metrics proposed in the literature for capturing the quality of Object-Oriented (OO) design and code, for example, ([Aggarwal05]; [Braind98][Braind99]; [Bieman95]; [Cartwright00]; [Chidamber94][Chidamber91]; [Harrison98]; [Henderson96]; [Hitz00]; [Lake94]; [Li93]; [Lee95]; [Lorenz94]; [Tegarden95]). These metrics provide ways to evaluate the quality of software and their use in earlier phases of software development can help organizations in assessing large software development quickly, at a low cost [Braind99]. But how do we know which metrics are useful in capturing important quality attributes such as fault-proneness, effort, productivity or amount of maintenance modifications. Empirical studies of real systems can provide relevant answers. There have been few empirical studies evaluating the effect of object-oriented metrics on software quality and constructing models that utilize them in predicting quality attributes in the system, such as (Basili96] [Binkley98]; [Braind00][Braind01]; [Cartwright00]; [Chidamber98]; [Emam99][Emam01]; [Gyimothy05]; [Harrison98]; [Li93]; [Ping02]). More data based by empirical studies, which are capable of being verified by observation or experiment are needed. The evidence gathered through these empirical studies is today considered to be the most powerful support possible for testing a given hypothesis. In this paper, we empirically investigate and validate a set of OO metrics given by [Chidamber94] [Chidamber91] and [Braind99]. These metrics are analyzed by 12 software projects containing 136 classes. The study is divided into following parts: - Principal component method of factor analysis is used to find whether all these metrics are independent or are capturing same underlying property of the object being measured.
- Univariate logistic regression analysis is carried out to test the hypothesis that size, coupling and inheritance increase fault proneness of a class whereas cohesion increase decrease fault proneness of a class and find individual impact of metrics on fault proneness.
- Finally a model using multivariate logistic regression analysis for predicting fault proneness of classes is given to predict which classes of a java application released in future will be faulty.
The results show that though the number of OO metrics is large but the number of dimensions actually found is much low. Further it was observed that import coupling (that count the number of other classes called by a class) metrics are strongly associated with fault proneness and predict faulty classes with high accuracy. Based on these results, it is reasonable to claim that such a model could help for planning and executing testing by focusing resources on fault prone parts of the design and code. The paper is organized as follows: Section 2 summarizes the metrics studied, describes sources from which data is collected and presents hypothesis to be tested in the study. Section 3 presents the research methodology followed in this paper. In section 4 the results of the study are given. The model is evaluated in section 5. Limitations of the study are presented in section 6 and conclusions of the work are presented in section 7. ## 2 RESEARCH BACKGROUNDIn this section, we present the summary of metrics studied in this paper (Section 2.1), empirical data collection (Section 2.2) and hypotheses to be tested in our work (Section 2.3). Our focus in the study is metrics proposed by [Chidamber94][Chidamber91] and [Braind99]. ## Metrics StudiedThe metrics of coupling, cohesion, inheritance and size are the independent variables used in this study. Our focus is on OO metrics that are used as independent variables in a prediction model that is usable at early stages of software development. The metrics selected in this paper are summarized in Table 1. These metrics are explained with examples in [Aggarwal05][Aggarwal06].
Table 1: Object-Oriented Metrics ## Empirical Data CollectionTo analyze the metrics chosen for this work, their values are computed for twelve different systems. These systems are developed by undergraduate engineering students and Masters of Computer Application students at School of Information Technology, of our University. The systems were developed using Java programming language over duration of four months. The aim was to teach the students system analysis and design techniques as part of their course curriculum. All students had experience with Java language and thus they had basic knowledge necessary for this study. The students were also taught about algorithmic detail The students were divided into 12 teams of four students each. Each team developed a medium-sized system such as flight reservation, chat server, proxy server etc. The development process used was waterfall model. Documents were produced at each phase of software development. Faults were reported to the developers. A separate group of students having prior knowledge of system testing under the guidance of senior faculty were assigned the task of testing systems according to test plans. The following relevant data was collected: - The design and source code of the java programs
- The faulty data found by the testing team.
The 12 systems under study consist of 136 classes (39 KLOC) out of which 85 are system classes and 51 standard library classes available in java language. These classes contain functions to manipulate files, strings, lists, hash tables, frames, windows, menus, threads, socket connection etc. All metric values are computed on system classes whereas coupling and inheritance metrics are also calculated between ‘system classes' and ‘standard library classes'. It was observed during testing that the classes coupled with standard library classes were less fault prone than those coupled with system classes. It was also noticed that a large number of system classes inherited standard library classes. These classes did not need much testing as compared to the system classes, which inherit some of other system classes. Thus, the values of metrics for standard library classes are separately shown, as their effect on fault proneness is different from system classes. ## HypothesesWe test the hypotheses given below to find our empirical consequences.
## 3 RESEARCH METHODOLOGYIn this section, the procedure used to analyze the data collected for each measure is described in following stages: - Principal-Component Method: Principal-Component Method (or P.C. method) is used to maximize the sum of squared loadings of each factor extracted in turn. The P.C. method aims at constructing new variable (P
_{i}), called Principal Component (P.C.) out of a given set of variables. The variables with high loadings help identify the dimension P.C. is capturing, but this usually requires some degree of interpretation. In order to identify these variables, and interpret the P.C.s, we consider the rotated components. As the dimensions are independent, orthogonal rotation is used. There are various strategies to perform such rotation. We used the varimax rotation, which is the most frequently used strategy in literature. Eigenvalue (or latent root) is associated with each P.C. It refers to the sum of squared values of loadings relating to dimension, and then the sum is referred to as eigenvalue. Eigenvalue indicates the relative importance of each dimension for the particular set of variables being analyzed. In our study, the P.C.s with eigenvalue greater than 1 is taken for interpretation [Kothari89]. - Logistic Regression (LR) and model prediction: LR is the most widely used technique [Hosmer89] in literature used to predict dependent variable from set of independent variables ( a detailed description is given by [Basili96] and [Hosmer89] ). In our work independent variable are OO metrics and dependent variable is fault proneness. LR is of two types: (i) Univariate LR (ii) Multivariate LR
Univariate LR is a statistical method that formulates a mathematical model depicting relationship among each independent variable and dependent variable. This technique is used to test hypotheses given in Section 2.3. Multivariate LR is used to construct a prediction model for the fault-proneness of classes. In this method combination of metrics are used to determine the effect on dependent variable. In LR two stepwise selection methods forward selection and backward elimination are used [Hosmer89]. In forward stepwise procedure, stepwise variable entry examines the variables in the block at each step for entry. The backward elimination method includes all the independent variables in the model. Variables are deleted one at a time from the model until a stopping criteria is fulfilled. We have used backward elimination method using metrics selected in P.C. method and univariate analysis. For model prediction a test of multicollinearity is performed. The interpretation of model becomes difficult if multicollinearity is present. Let*X*be the covariates of the model predicted. P.C. method is applied on these variables to find maximum eigenvalue, e_{1},X_{2},.....X_{n}_{max}and minimum eigenvalue, e_{min}. The conditional number is defined as . If the value of the conditional number is 30 then multicollinearity is not tolerable [Belsley80].
The following statistics are reported for each significant metric: **Odds Ratio:**It is the probability of the event divided by the probability of the non-event. The event in our study is having a fault and nonevent is probability of not having a fault.**Maximum Likelihood Estimation (MLE) and Coefficients (A**MLE is a statistical method for estimating the coefficients of a model. The likelihood function (_{i}'s):*L*) measures the probability of observing the set of dependent variable values (*P*).._{1}, P_{2}… P_{n}**The statistical significance (**It is the significance level of the coefficient, larger the statistical significance less is the estimated impact of the independent variables (OO metrics). In our study we used 0.05 as the significance threshold.*sig*):**The**It is the proportion of the variance in the dependent variable that is explained by the variance of the independent variables. The higher the effect of the model's explanatory variables implies better accuracy of the model.*R*Statistic:^{2}
- Performance Evaluation: The model is evaluated in following ways:
- The sensitivity and specificity of the model is calculated to predict the correctness of the model. The percentage of classes correctly predicted to be fault prone is known as sensitivity of the model.
*Sensitivity*can be formally defined as:
The higher the sensitivity (% correct predictions), the better the model. The percentage of non-occurrences correctly predicted i.e. classes predicted not to be fault prone is called specificity of the model.
- To predict the accuracy of model it should be applied on different data sets. Therefore we performed k-cross validation of model [Stone74]. The data set is randomly divided into k subsets. Each time one of the k subsets is used as the test set and the other k-1 subsets are used to form a training set. Thus we get the fault proneness for all the k classes.
## 4 ANALYSIS RESULTSThis section presents the analysis results, following the procedure described in Section 3. P.C. analysis (Section 4.1), univariate analysis (Section 4.2) and multivariate analysis (Section 4.3) results are presented. ## Principal Component (P.C.) MethodThe coupling of system classes to system classes is counted separately from coupling of system classes to standard library classes. SL is suffixed with the metric name when coupling to standard library classes is counted. For instance CBO metric in such case is named as CBO_SL. The P.C. extraction method and varimax rotation method is applied on all metrics. The rotated component matrix is given in Table 2. The values above 0.7 (shown in bold in Table 2) are the metrics that are used to interpret the P.C.s. For each P.C., we also provide its eigenvalue, variance percent and cumulative percent. The interpretations of PCs are given as follows: - P1: CBO_SL, OCAIC_SL, OCMIC_SL, CBO1_SL and OMMIC_SL measure coupling from standard library classes.
- P2: LCOM1, LCOM2, WMC and OCMIC. This dimension includes coupling, cohesion and size metrics. This indicates that import coupling and cohesion metrics have correlation with size.
- P3: OMMIC, RFC are coupling metrics. These metrics count import coupling from system classes through method invocations.
- P4: AMMIC_SL, OCAIC are import coupling metrics.
- P5: CBO, CBO1 are coupling metrics that count both import and export coupling.
- P6: NOC is an inheritance metric that counts number of children of a class.
Hence, we see that 5 out of 6 dimensions contain coupling metrics. Two dimensions P4 and P6 capture inheritance based coupling and inheritance metric. We also see that metrics capturing different properties are included in the same dimension P2. ## Univariate Logistic Regression (LR) AnalysisIn this subsection we find the relationship of independent variables (OO metrics) with dependent variable (fault proneness). Univariate LR analysis is done on 85 system classes. The table 3 provides the coefficient (B), standard error (SE), statistical significance (sig), R 2 statistic and odds ratio (exp(B)), for each measure. Metrics with no variance or lower variance are excluded from the table. The metrics with a significant relationship to fault proneness, that is, below or at the significance (named as Sig. in Table 3) threshold of 0.05 are shown in bold (see Table 3). The metrics that are not shown in bold do not have a significant relationship with fault proneness.
Table 2: Rotated Principal Component T he following observations are made based on the results given in Table 3: - CBO and CBO1 metrics that count the both import and export coupling are related to fault proneness supporting hypotheses H1. Hence we reject the null hypothesis.
- But metrics OMMEC, OCMEC and OCAEC are not strongly related to fault proneness i.e. for instance if a classA is coupled to classB this will not make classB fault prone. Similar results have been shown in [Braind00]. Hence null hypothesis is accepted for export coupling metrics and hypothesis H2 is rejected.
- LCOM1 and LCOM2 metrics show positive coefficients. This indicates that the probability of fault proneness increases as the cohesion of a class decreases. Thus we accept the hypotheses H3 and reject null hypothesis.
- The results indicate that inheritance metric DIT measuring depth of inheritance tree is not related to fault proneness. This shows that student programmers give more attention to classes being inherited (i.e super classes) and follow a well-defined strategy. Hence null hypothesis is accepted for DIT metrics and hypothesis H4 is rejected.
- Metric NOC counting number of children of a class is not related to fault proneness. Hence null hypothesis is accepted for NOC metric and hypothesis H5 is rejected.
Table 3: Univariate LR Analysis of Metrics - Size metric WMC is related to fault proneness and thus hypothesis H6 is accepted.
## Multivariate Logistic Regression (LR) AnalysisIn this section we predict model to identify the faulty classes. Metrics are pre selected using results from P.C. and univariate analysis using backward elimination method. The model includes an intercept referred to as constant. The model includes two coupling metrics OMMIC and RFC. One size metric WMC is also included in the model. OMMIC and RFC metrics were covered in dimension P3 and also found strongly related to fault proneness in univariate analysis. WMC metric is captured in dimension P2. The summary of Model statistics is presented in Table 4. The conditional number is that does not indicate any problem.
Table 4: Model Statistics The model was applied to 85 system classes and accuracy of the model is presented in Table 5. The R^{2} statistic and log likelihood of the model is fairly high. Out of 37 classes actually fault prone, 32 classes were predicted to be fault prone. The sensitivity of the model is 86.49%. Similarly 45 out of 48 classes were predicted not to be fault prone. Thus specificity of the model is 93.75%.
Table 5: Predicted Correctness of Model ## 5 MODEL EVALUATIONThe sensitivity and specificity of model predicted in previous section is quite high but it is somewhat optimistic since the model is applied on same data set from which it is derived from. To predict accuracy of model it should be applied on the different data sets. Thus we performed 9-cross validation of model following the procedure given in Section 3. For the 9-cross validation, the classes were randomly divided into 9 parts of approximately equal (5 partitions of 9 data points each and 4 partitions of 10 data points each).
Table 6: Results of 9-cross validation of Model Table 6 shows that 30 out of 37 classes are correctly predicted to be fault prone. The sensitivity of the model is 81.89%. Similarly 45 out of 48 classes were predicted not to be fault prone. Thus specificity of the model is 93.75%. This shows that the model also predicts classes with similar data set other than from which it is derived from with high accuracy. ## 6 THREATS TO VALIDITYThe study has a number of limitations that are not unique to our study but are common with most of the empirical studies in the literature. However, it is necessary to repeat them here. The degree to which the results of our study can be generalized to other research settings is questionable. The reason is that the systems developed are small-sized. The developers are students and hence are not well trained as professional developers. In this study the severity of faults is not taken into account. There can be number of faults which can leave the system in various states e.g. a failure that is caused by a fault may lead to a system crash or an inability to open a file. The former failure is more severe than latter, thus the types of fault is not the same. The same limitation is also reported in [Emam99]. Though these results provide guidance for future research on the impact of OO metrics on fault proneness, further validations are needed with different systems to draw stronger conclusions. ## 7 CONCLUSIONSWe have conducted an empirical validation of twenty six metrics. The systems under study are medium sized systems written in Java and have a testing record including number of faults found in each class. In this study we first find the interrelationships among selected metrics and then found the individual and combined effect of selected metrics on fault proneness. The number of dimensions captured in P.C. analysis is only 6, which are much lower than the number of metrics. This simply supports the fact that many of the metrics proposed are based on comparable ideas and therefore provide somewhat redundant information. The results of univariate LR analysis show that most of the import coupling and cohesion metrics are found related to fault proneness. On the other hand inheritance metrics were not found related to fault proneness. The results of multivariate LR analysis show that import coupling and size metrics measure fault proneness with high accuracy. As far as cohesion metrics are concerned they were found highly related to fault proneness in univariate LR analysis but none was found significantly related to fault proneness in multivariate LR analysis. The model has sensitivity 86.5% and specificity above 90%. The metrics could not be evaluated over a large data set but this is a problem that has plagued much of empirical software engineering research. More similar type of studies must be carried out with different data sets to give generalized results across different organizations. We plan to replicate our study on large data set and industrial OO software system. We further plan to predict the models based on early analysis and design artifacts. ## REFERENCES[Binkley98] A.Binkley and S.Schach, "Validation of the Coupling Dependency Metric as a risk Predictor", [Lake94] A.Lake, C.Cook, "Use of factor analysis to develop OOP software complexity metrics", [Henderson96] B.Henderson-sellers, " [Kothari89] C.R.Kothari, " [Belsley80] D.Belsley, E. Kuh, R. Welsch, " [Hosmer89] D.Hosmer, S.Lemeshow, " [Tegarden95] D.Tegarden, S. Sheetz, D.Monarchi, "A Software Complexity Model of Object- Oriented Systems [Bieman95] J.Bieman, B.Kang, "Cohesion and Reuse in an Object-Oriented System", [Emam99] K..El Emam, S. Benlarbi, N.Goel, S. Rai, "A Validation of Object-Oriented Metrics", Technical Report ERB-1063, [Emam01] K.El Emam, W. Melo, J. Machado, "The Prediction of Faulty Classes Using Object-Oriented Design Metrics", [Aggarwal05] K.K.Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Analysis of Object-Oriented Metrics", International Workshop on Software Measurement (IWSM), Montréal, Canada, 2005. [Aggarwal06] K.K.Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Empirical Study of Object-Oriented Metrics", [Aggarwal05] K.K.Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Software Reuse Metrics for Object-Oriented Systems", [Braind98] L.Briand, J.Daly and J. Wust, "A Unified Framework for Cohesion Measurement in Object-Oriented Systems", [Braind99] L.Briand, J.Daly and J. Wust, "A Unified Framework for Coupling Measurement in Object-Oriented Systems", [Braind00] L.Briand, J.Daly, V.Porter, J. Wust, "Exploring the relationships between design measures and software quality", [Braind01] L. Briand, J. Wüst, H. Lounis, "Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs, [Cartwright00] M.Cartwright, M.Shepperd, "An Empirical Investigation of an Object-Oriented Software System", [Hitz00] M.Hitz, B. Montazeri, "Measuring Coupling and Cohesion in Object-Oriented Systems", [Lorenz94] M.Lorenz, J.Kidd, " [Stone74] M.Stone, "Cross-validatory choice and assessment of statistical predictions", [Harrison98] R.Harrison, S.J.Counsell, R.V.Nithi, "An Evaluation of MOOD set of Object-Oriented Software Metrics", [Chidamber94] S.Chidamber and C.Kemerer, "A metrics Suite for Object-Oriented Design [Chidamber91] S.Chidamber, C. Kemerer, "Towards a Metrics Suite for Object Oriented design", [Chidamber98] S.Chidamber, D. Darcy, C. Kemerer, "Managerial use of Metrics for Object-Oriented Software: An Exploratory Analysis", [Gyimothy05] T.Gyimothy, R. Ferenc I. Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction", [Basili96] V.Basili, L.Briand, W.Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators", [Li93] W.Li, S.Henry, "Object-Oriented Metrics that Predict Maintainability", [Lee95] Y.Lee, B.Liang, S.Wu, F.Wang, "Measuring the Coupling and Cohesion of an Object-Oriented program based on Information flow", [Ping02] Yu Ping, Ma Xiaoxing, Lu Jian "Predicting Fault-Proneness using OO Metrics: An Industrial Case Study", ## About the authors
Cite this article as follows: K.K Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Investigating effect of Design Metrics on Fault Proneness in Object-Oriented Systems", in |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||