Demand for quality software has undergone with rapid growth during the last few years. This is leading to an increase in the development of metrics for measuring the properties of software such as coupling, cohesion or inheritance that can be used in early quality assessments. Quality models that explore the relationship between these properties and quality attributes such as fault proneness, maintainability, effort or productivity are needed to use these metrics effectively. The goal of this work is to empirically explore the relationship between object-oriented design metrics and fault proneness of object-oriented system classes. The study used data collected from Java applications is containing 136 classes. We use a set of twenty-six design metrics in our work. Result of this study shows that many metrics are based on comparable ideas and provide redundant information. It is shown that by using a subset of metrics in the prediction models can be built to identify the faulty classes. The proposed model predicts faulty classes with more than 80% accuracy.
Keywords: Measurement, Metrics, Object-Oriented, Coupling, Cohesion, Inheritance, Empirical Analysis.
There are several metrics proposed in the literature for capturing the quality of Object-Oriented (OO) design and code, for example, ([Aggarwal05]; [Braind98][Braind99]; [Bieman95]; [Cartwright00]; [Chidamber94][Chidamber91]; [Harrison98]; [Henderson96]; [Hitz00]; [Lake94]; [Li93]; [Lee95]; [Lorenz94]; [Tegarden95]). These metrics provide ways to evaluate the quality of software and their use in earlier phases of software development can help organizations in assessing large software development quickly, at a low cost [Braind99]. But how do we know which metrics are useful in capturing important quality attributes such as fault-proneness, effort, productivity or amount of maintenance modifications. Empirical studies of real systems can provide relevant answers. There have been few empirical studies evaluating the effect of object-oriented metrics on software quality and constructing models that utilize them in predicting quality attributes in the system, such as (Basili96] [Binkley98]; [Braind00][Braind01]; [Cartwright00]; [Chidamber98]; [Emam99][Emam01]; [Gyimothy05]; [Harrison98]; [Li93]; [Ping02]).
More data based by empirical studies, which are capable of being verified by observation or experiment are needed. The evidence gathered through these empirical studies is today considered to be the most powerful support possible for testing a given hypothesis. In this paper, we empirically investigate and validate a set of OO metrics given by [Chidamber94] [Chidamber91] and [Braind99]. These metrics are analyzed by 12 software projects containing 136 classes. The study is divided into following parts:
The results show that though the number of OO metrics is large but the number of dimensions actually found is much low. Further it was observed that import coupling (that count the number of other classes called by a class) metrics are strongly associated with fault proneness and predict faulty classes with high accuracy. Based on these results, it is reasonable to claim that such a model could help for planning and executing testing by focusing resources on fault prone parts of the design and code.
The paper is organized as follows: Section 2 summarizes the metrics studied, describes sources from which data is collected and presents hypothesis to be tested in the study. Section 3 presents the research methodology followed in this paper. In section 4 the results of the study are given. The model is evaluated in section 5. Limitations of the study are presented in section 6 and conclusions of the work are presented in section 7.
2 RESEARCH BACKGROUND
In this section, we present the summary of metrics studied in this paper (Section 2.1), empirical data collection (Section 2.2) and hypotheses to be tested in our work (Section 2.3). Our focus in the study is metrics proposed by [Chidamber94][Chidamber91] and [Braind99].
The metrics of coupling, cohesion, inheritance and size are the independent variables used in this study. Our focus is on OO metrics that are used as independent variables in a prediction model that is usable at early stages of software development. The metrics selected in this paper are summarized in Table 1. These metrics are explained with examples in [Aggarwal05][Aggarwal06].
Table 1: Object-Oriented Metrics
Empirical Data Collection
To analyze the metrics chosen for this work, their values are computed for twelve different systems. These systems are developed by undergraduate engineering students and Masters of Computer Application students at School of Information Technology, of our University. The systems were developed using Java programming language over duration of four months. The aim was to teach the students system analysis and design techniques as part of their course curriculum. All students had experience with Java language and thus they had basic knowledge necessary for this study. The students were also taught about algorithmic detail
The students were divided into 12 teams of four students each. Each team developed a medium-sized system such as flight reservation, chat server, proxy server etc. The development process used was waterfall model. Documents were produced at each phase of software development. Faults were reported to the developers. A separate group of students having prior knowledge of system testing under the guidance of senior faculty were assigned the task of testing systems according to test plans.
The following relevant data was collected:
The 12 systems under study consist of 136 classes (39 KLOC) out of which 85 are system classes and 51 standard library classes available in java language. These classes contain functions to manipulate files, strings, lists, hash tables, frames, windows, menus, threads, socket connection etc.
All metric values are computed on system classes whereas coupling and inheritance metrics are also calculated between ‘system classes' and ‘standard library classes'. It was observed during testing that the classes coupled with standard library classes were less fault prone than those coupled with system classes. It was also noticed that a large number of system classes inherited standard library classes. These classes did not need much testing as compared to the system classes, which inherit some of other system classes. Thus, the values of metrics for standard library classes are separately shown, as their effect on fault proneness is different from system classes.
We test the hypotheses given below to find our empirical consequences.
H1 (for import coupling metrics): A class with more import coupling than its peers is more fault-prone as compared to them. (Null hypothesis: A class with more import coupling than its peers is less fault-prone prone as compared to them).
H2 (for export coupling metrics): A class with more export coupling than its peers is more fault-prone as compared to them. (Null hypothesis: A class with more export coupling than its peers is less fault-prone as compared to them).
H3 (for cohesion metrics): A class with lower cohesion than its peers is more fault-prone as compared to them.. (Null hypothesis: A class with lower cohesion than its peers is less fault-prone as compared to them).
H4 (for DIT metric): A class located lower in a class inheritance hierarchy than its peers is more fault-prone as compared to them. (Null hypothesis: A class located lower in a class inheritance hierarchy than its peers is less fault-prone as compared to them.).
H5 (for NOC metric): A class with a larger number of descendants than its peers is more fault-prone as compared to them. (Null hypothesis: A class with a larger number of descendants than its peers is less fault-prone as compared to them).
H6 (for size metrics): A class with a larger size i.e. more information than its peers is more fault-prone as compared to them. (Null hypothesis: A class with a larger size i.e. more information than its peers is less fault-prone as compared to them).
3 RESEARCH METHODOLOGY
In this section, the procedure used to analyze the data collected for each measure is described in following stages:
The following statistics are reported for each significant metric:
The higher the sensitivity (% correct predictions), the better the model. The percentage of non-occurrences correctly predicted i.e. classes predicted not to be fault prone is called specificity of the model.
Specificity can be formally defined as:
4 ANALYSIS RESULTS
This section presents the analysis results, following the procedure described in Section 3. P.C. analysis (Section 4.1), univariate analysis (Section 4.2) and multivariate analysis (Section 4.3) results are presented.
Principal Component (P.C.) Method
The coupling of system classes to system classes is counted separately from coupling of system classes to standard library classes. SL is suffixed with the metric name when coupling to standard library classes is counted. For instance CBO metric in such case is named as CBO_SL. The P.C. extraction method and varimax rotation method is applied on all metrics. The rotated component matrix is given in Table 2. The values above 0.7 (shown in bold in Table 2) are the metrics that are used to interpret the P.C.s. For each P.C., we also provide its eigenvalue, variance percent and cumulative percent. The interpretations of PCs are given as follows:
Hence, we see that 5 out of 6 dimensions contain coupling metrics. Two dimensions P4 and P6 capture inheritance based coupling and inheritance metric. We also see that metrics capturing different properties are included in the same dimension P2.
Univariate Logistic Regression (LR) Analysis
In this subsection we find the relationship of independent variables (OO metrics) with dependent variable (fault proneness). Univariate LR analysis is done on 85 system classes. The table 3 provides the coefficient (B), standard error (SE), statistical significance (sig), R 2 statistic and odds ratio (exp(B)), for each measure. Metrics with no variance or lower variance are excluded from the table. The metrics with a significant relationship to fault proneness, that is, below or at the significance (named as Sig. in Table 3) threshold of 0.05 are shown in bold (see Table 3). The metrics that are not shown in bold do not have a significant relationship with fault proneness.
Table 2: Rotated Principal Component
T he following observations are made based on the results given in Table 3:
Table 3: Univariate LR Analysis of Metrics
Multivariate Logistic Regression (LR) Analysis
In this section we predict model to identify the faulty classes. Metrics are pre selected using results from P.C. and univariate analysis using backward elimination method. The model includes an intercept referred to as constant.
The model includes two coupling metrics OMMIC and RFC. One size metric WMC is also included in the model. OMMIC and RFC metrics were covered in dimension P3 and also found strongly related to fault proneness in univariate analysis. WMC metric is captured in dimension P2. The summary of Model statistics is presented in Table 4. The conditional number is that does not indicate any problem.
Table 4: Model StatisticsThe model was applied to 85 system classes and accuracy of the model is presented in Table 5. The R2 statistic and log likelihood of the model is fairly high. Out of 37 classes actually fault prone, 32 classes were predicted to be fault prone. The sensitivity of the model is 86.49%. Similarly 45 out of 48 classes were predicted not to be fault prone. Thus specificity of the model is 93.75%.
Table 5: Predicted Correctness of Model
5 MODEL EVALUATION
The sensitivity and specificity of model predicted in previous section is quite high but it is somewhat optimistic since the model is applied on same data set from which it is derived from. To predict accuracy of model it should be applied on the different data sets. Thus we performed 9-cross validation of model following the procedure given in Section 3. For the 9-cross validation, the classes were randomly divided into 9 parts of approximately equal (5 partitions of 9 data points each and 4 partitions of 10 data points each).
Table 6: Results of 9-cross validation of Model
Table 6 shows that 30 out of 37 classes are correctly predicted to be fault prone. The sensitivity of the model is 81.89%. Similarly 45 out of 48 classes were predicted not to be fault prone. Thus specificity of the model is 93.75%. This shows that the model also predicts classes with similar data set other than from which it is derived from with high accuracy.
6 THREATS TO VALIDITY
The study has a number of limitations that are not unique to our study but are common with most of the empirical studies in the literature. However, it is necessary to repeat them here.
The degree to which the results of our study can be generalized to other research settings is questionable. The reason is that the systems developed are small-sized. The developers are students and hence are not well trained as professional developers.
In this study the severity of faults is not taken into account. There can be number of faults which can leave the system in various states e.g. a failure that is caused by a fault may lead to a system crash or an inability to open a file. The former failure is more severe than latter, thus the types of fault is not the same. The same limitation is also reported in [Emam99].
Though these results provide guidance for future research on the impact of OO metrics on fault proneness, further validations are needed with different systems to draw stronger conclusions.
We have conducted an empirical validation of twenty six metrics. The systems under study are medium sized systems written in Java and have a testing record including number of faults found in each class. In this study we first find the interrelationships among selected metrics and then found the individual and combined effect of selected metrics on fault proneness.
The number of dimensions captured in P.C. analysis is only 6, which are much lower than the number of metrics. This simply supports the fact that many of the metrics proposed are based on comparable ideas and therefore provide somewhat redundant information.
The results of univariate LR analysis show that most of the import coupling and cohesion metrics are found related to fault proneness. On the other hand inheritance metrics were not found related to fault proneness.
The results of multivariate LR analysis show that import coupling and size metrics measure fault proneness with high accuracy. As far as cohesion metrics are concerned they were found highly related to fault proneness in univariate LR analysis but none was found significantly related to fault proneness in multivariate LR analysis. The model has sensitivity 86.5% and specificity above 90%.
The metrics could not be evaluated over a large data set but this is a problem that has plagued much of empirical software engineering research. More similar type of studies must be carried out with different data sets to give generalized results across different organizations. We plan to replicate our study on large data set and industrial OO software system. We further plan to predict the models based on early analysis and design artifacts.
[Binkley98] A.Binkley and S.Schach, "Validation of the Coupling Dependency Metric as a risk Predictor", International Conference on Software Engineering (ICSE), 452-455, 1998.
[Lake94] A.Lake, C.Cook, "Use of factor analysis to develop OOP software complexity metrics", Proc. 6th Annual Oregon Workshop on Software Metrics, Silver Falls, Oregon, 1994.
[Henderson96] B.Henderson-sellers, "Object-Oriented Metrics, Measures of Complexity", Prentice Hall, 1996.
[Kothari89] C.R.Kothari, "Research Methodology. Methods and Techniques", New Age International Limited.
[Belsley80] D.Belsley, E. Kuh, R. Welsch, "Regression Diagnostics: Identifying Influential Data and Sources of Collinearity", John Wiley & Sons, 1980.
[Hosmer89] D.Hosmer, S.Lemeshow, "Applied Logistic regression", John Wiley and Sons, 1989.
[Tegarden95] D.Tegarden, S. Sheetz, D.Monarchi, "A Software Complexity Model of Object- Oriented Systems", Decision Support Systems, vol. 13 no.3-4, 241-262, 1995.
[Bieman95] J.Bieman, B.Kang, "Cohesion and Reuse in an Object-Oriented System", Proc. CM Symp. Software Reusability (SSR'94), 259-262, 1995.
[Emam99] K..El Emam, S. Benlarbi, N.Goel, S. Rai, "A Validation of Object-Oriented Metrics", Technical Report ERB-1063, National Research Council of Canada (NR C), 1999.
[Emam01] K.El Emam, W. Melo, J. Machado, "The Prediction of Faulty Classes Using Object-Oriented Design Metrics", Journal of Systems and Software, vol. 56, 63-75, 2001.
[Aggarwal05] K.K.Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Analysis of Object-Oriented Metrics", International Workshop on Software Measurement (IWSM), Montréal, Canada, 2005.
[Aggarwal06] K.K.Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Empirical Study of Object-Oriented Metrics", Journal of Object-Technology, vol. 5, no. 8, 149-173, 2006.
[Aggarwal05] K.K.Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Software Reuse Metrics for Object-Oriented Systems", Third ACIS Int'l Conference on Software Engineering Research, Management and Applications (SERA'05), IEEE Computer Society, 48-55, 2005.
[Braind98] L.Briand, J.Daly and J. Wust, "A Unified Framework for Cohesion Measurement in Object-Oriented Systems", Empirical Software Engineering, 3, 65-117, 1998.
[Braind99] L.Briand, J.Daly and J. Wust, "A Unified Framework for Coupling Measurement in Object-Oriented Systems", IEEE Transactions on software Engineering, vol. 25, 91-121, 1999.
[Braind00] L.Briand, J.Daly, V.Porter, J. Wust, "Exploring the relationships between design measures and software quality", Journal of Systems and Software, vol. 5, 245-273, 2000.
[Braind01] L. Briand, J. Wüst, H. Lounis, "Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs, Empirical Software Engineering: An International Journal, vol 6, no 1, 11-58, 2001.
[Cartwright00] M.Cartwright, M.Shepperd, "An Empirical Investigation of an Object-Oriented Software System", IEEE Transactions of Software Engineering. vol.26, Issue 8, 786 – 796, Aug. 2000.
[Hitz00] M.Hitz, B. Montazeri, "Measuring Coupling and Cohesion in Object-Oriented Systems", Proc. Int. Symposium on Applied Corporate Computing, Monterrey, Mexico, 1995.
[Lorenz94] M.Lorenz, J.Kidd, " Object-Oriented Software Metrics", Prentice-Hall, 1994.
[Stone74] M.Stone, "Cross-validatory choice and assessment of statistical predictions", J. Royal Stat. Soc., 36, 111-147, 1974.
[Harrison98] R.Harrison, S.J.Counsell, R.V.Nithi, "An Evaluation of MOOD set of Object-Oriented Software Metrics", IEEE Trans. Software Engineering, vol. SE-24, no.6, 491-496, 1998.
[Chidamber94] S.Chidamber and C.Kemerer, "A metrics Suite for Object-Oriented Design", IEEE Trans. Software Engineering, vol. SE-20, no.6, 476-493, 1994.
[Chidamber91] S.Chidamber, C. Kemerer, "Towards a Metrics Suite for Object Oriented design", Proc. Conference on Object-Oriented Programming: Systems, Languages and Applications (OOPSLA'91), Published in SIGPLAN Notices, vol 26 no. 11, 197-211, 1991.
[Chidamber98] S.Chidamber, D. Darcy, C. Kemerer, "Managerial use of Metrics for Object-Oriented Software: An Exploratory Analysis", IEEE Transactions on Software Engineering, vol.24, no.8, 629-639, 1998.
[Gyimothy05] T.Gyimothy, R. Ferenc I. Siket, "Empirical validation of object-oriented metrics on open source software for fault prediction", IEEE Trans. Software Engineering, vol. 31, Issue 10, 897 – 910, Oct. 2005.
[Basili96] V.Basili, L.Briand, W.Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators", IEEE Transactions on Software Engineering, vol. 22 no.10, 751-761, 1996.
[Li93] W.Li, S.Henry, "Object-Oriented Metrics that Predict Maintainability", Journal of Systems and Software, vol. 23, no.2, 111-122, 1993.
[Lee95] Y.Lee, B.Liang, S.Wu, F.Wang, "Measuring the Coupling and Cohesion of an Object-Oriented program based on Information flow", International Conference on Software Quality, Maribor, Slovenia 1995.
[Ping02] Yu Ping, Ma Xiaoxing, Lu Jian "Predicting Fault-Proneness using OO Metrics: An Industrial Case Study", CSMR 2002, Budapest, Hungary, 99-107.
About the authors
Cite this article as follows: K.K Aggarwal, Yogesh Singh, Arvinder Kaur, Ruchika Malhotra, "Investigating effect of Design Metrics on Fault Proneness in Object-Oriented Systems", in Journal of Object Technology, vol. 6, no. 10, November-December 2007, pp. 127-141 http://www.jot.fm/issues/issue_2007_10/article5/