On the Contribution of UML Diagrams to Software System Comprehension

Irit Hadar and Orit Hazzan, Technion – Israel Institute of Technology, Israel

REFEREED
ARTICLE

Abstract

Program comprehension has been researched extensively ever since software systems became complex and longer than a few hundreds code lines. At the same time, the way in which people comprehend visual models of software systems has received much less attention. This paper focuses on the comprehension of UML diagrams. During the research presented in this paper, data was gathered from the work of two groups. Group 1 consisted of 13 senior computer science students who worked in five teams. The students were asked to trace and analyze the process by which they retrieved information from UML diagrams of a given system. Group 2 consisted of 42 senior computer science students who were requested to complete a questionnaire in which they were asked to rank different types of UML diagrams according to their importance. The section on data analysis discusses strategies adopted by the novices in their attempt to reveal the meaning of a set of UML diagrams, as well as their attitudes towards the different diagrams. One of the interesting observations is that although each team had its own preferences with respect to the usefulness of each specific type of diagrams, the overall use of each diagram type is very similar across the teams.

1 INTRODUCTION

UML has attained the status of a de facto modeling language standard ([Booch, 1999], [Kobryn, 1999], [OMG, 1999]). According to our literature review, however, only few studies have examined how people write or comprehend object-oriented visual models in general, and UML diagrams in particular. At the same time, the writing and comprehension of computer programs have been studied extensively over the last few decades. Research in these areas looks at how novices and experts cope with these two activities. Research about the writing of computer programs examines, among other topics, how programmers battle the complexities involved in software development; while research on program comprehension looks mainly at strategies adopted by programmers while attempting to comprehend a computer program. Specifically, research about program comprehension deals mainly with topics such as mental models and cognitive processes ([Brooks, 1983], [Littman, Pinto, Letovsky and Soloway, 1987], [Letovsky, 1987]); program reading techniques and strategies ([Francel and Rugaber, 2001]); influence of computer programs presentation on program comprehension ([Bentley, 1986], [Knuth, 1984]); and the comprehension of computer programs by novices vs. experts ([Soloway and Ehrlich, 1984], [Pennington, 1987]). In general, these papers acknowledge the complexities involved in program comprehension. Moreover, it is repeatedly stated in the aforementioned studies that no unique theory for describing program comprehension has been proposed so far and that such a theory should take into consideration that “programming behavior can be understood only with reference to the interactions between multiple knowledge sources” ([Davies, 1993], p. 265).

The objective of our research was to identify and describe strategies applied in the process of comprehending visual models of software systems. We view this research as a new branch of the extensive research on program comprehension. In analogy to research about the comprehension of computer programs, UML diagrams have been selected to be the focus of the study described in this paper. Visual models are the medium by which the specification and design of software systems are expressed, as computer programs are the medium by which objects and algorithms are expressed. In this analogy, the UML corresponds to a specific programming language.

The study described in this paper was conducted with the participation of senior computer science students. The added value of this research which investigates the process of understanding a software system documented by the UML, is two-folded: First, the research outlines the manner in which students use and integrate information they retrieve from several types of UML diagrams; Second, it addresses the question of the relative importance of the different types of diagrams during that specific process.

Section 2 of the paper describes the research setting. Section 3 presents conclusions derived from the data analysis. In Section 4 the research results are discussed and directions for future research are presented.

2 RESEARCH SETTING

Methodology

The data collection and data analysis in this research were based both on quantitative and qualitative research approaches. The qualitative research does not intend to prove a quantitative theory or to develop a solution to a specific problem, but rather to add knowledge and insights regarding a phenomena or problem identified by the researcher ([Bassey, 1999]). This approach is appropriate for this research since there is no intention of statistically proving a certain hypothesis, but rather of investigating the approach taken by students while trying to understand a software system, as well as the ways in which they use the different UML diagrams to fulfill this objective. The quantitative data gathered and analyzed were integrated into the qualitative analysis.

The research tools used in this study included observations, students’ learning materials and position questionnaires.

Population

Two groups of senior computer science major students participated in our research. The first group worked on a comprehension task based on UML diagrams; the second group completed a questionnaire that addressed the relative importance of different types of UML diagrams.

Group 1:

The participants in this group were 13 students, majoring in either Software Engineering or Information Systems, who were taking the course entitled Human Aspects of Software Engineering, taught by the second author at the Department of Computer Science at the Technion – Israel Institute of Technology.

The above course was taught parallel to a course entitled Methods in Software Engineering. The emphasis of the later course was on the entire life cycle of software, from the initial requirements, through analysis, design, implementation, integration and testing. In addition, the course provided complementary material on support activities, such as software maintenance and quality assurance. The formal requirement of the course was a team project, carried out by the students throughout the semester according to the progress of class lectures. At the beginning of the semester, the students received a “client document” (RFP) containing the requirements for a software system. The project was submitted in six stages: requirement document, analysis and specifications (UML), design (UML), test plan, code (in Java) and documentation of test results.

The Human Aspects of Software Engineering course was based, in part, on the Methods in Software Engineering course and some of the activities carried out during the course were associated with the project developed by the students in the Methods in Software Engineering course. The examination of this software development process was, however, carried out from a different angle. More specifically, while the focus in the Methods in Software Engineering course was on the software development process, the emphasis in the Human Aspects of Software Engineering course was placed on the people developing the project and mental and social processes were examined.

In general, the Human Aspects of Software Engineering course encourages a reflective mode of thinking in the spirit of [Schön, 1983]. Most of the students’ tasks were based on the students’ examination of their own way of thinking and working. Thus, one of the purposes of the task that provided the main data for our research (Cf. Table 1) was to encourage student reflection.

In addition, the work of two pairs of students on the task was used for its validation, as described in the next sub-section. These four students took the Methods of Software Engineering course, but did not participate in the Human Aspects of Software Engineering course.

Group 2:

The 42 participants in this group participated in the capstone course entitled ‘Software Engineering Project’ offered by the Department of Computer Science at the Technion. The course aims to train students in the entire process of software system development, including requirement definition, conceptual and detailed design, unit implementation, integration and testing.

The Task

Table 1 presents the task that constitutes the basis for the first stage of our research. Students were given three weeks to complete this homework assignment. The 27 diagrams in this task were taken from [Paltor and Lilius, 1999]. Prior to handing out the task, one lesson of two hours was devoted to the topic of program comprehension in which several program comprehension theories were presented ([Brooks, 1983], [Littman et al., 1987], [Fjeldstad and Hamlen, 1983], [Vans et al., 1999]). The objective of the said lesson was to introduce students to the cognitive complexity of program comprehension as well as to the variety of heuristics available for program comprehension.

Table 1. The Task.

Homework Assignment # 3 (in groups of two or three students)

Software documentation may be lost occasionally due to maintenance problems. In this homework assignment, a collection of UML diagrams is presented. The documentation of the computer system has been lost. Your task is to re-create the system description based on diagram analysis. In addition, you are asked to document and analyze the process you went through (a detailed description is presented below).

This task focuses on program comprehension, which is a central topic in the human aspect of software engineering. The two main objectives of the task are the improvement of computer program comprehension and the enhancement of the understanding of mental processes.

More specifically:

You are given a collection of UML diagrams that describe a Digital Audio Recorder. The specification documentation has been lost. Your task is to analyze the diagrams, write a general system description, and specify particular use cases.
During your work, you are requested to trace and document the analysis process and the way in which information is retrieved. In order to support the documentation process, it is recommended to mark the diagrams using some notation.
After completing Stage b, please go back and analyze the process you went through and characterize your work process.

You are asked to submit the following:

A description of the Digital Audio Recorder (general description as well as use cases).
Follow-up documentation:
1. The order of diagrams processed in your work.
2. The way in which you set to work on the task.
3. Miscellaneous documentation (such as sketches on the UML diagram pages, etc).
An analysis of the process of diagram comprehension and information retrieval:
1. Process characterization and the rationale behind it.
2. A graphical model representing the process.
3. Suggestions for the process improvement.

Additional comments:

It is recommended to dedicate at least 3-4 consecutive hours to diagram comprehension.
The diagrams are given in random order and are not stapled together. We recommended that you start off by checking which diagrams were given.
At the diagram analysis stage, one of the team members should focus on diagram comprehension and the other(s) on process documentation.

The task was performed by students of Group 1 (See ‘Population’) as a homework assignment, and thus it was highly important to validate not only the task itself, but the process of its execution as well. This was performed in three stages, using three teams that solved the task in our presence, as described below:

Prior to Group 1 performing the task, a pilot study was conducted with one pair of students, who were not course participants. In this pilot, we observed the way in which the students confronted and interpreted the task, noted which parts of it attracted their attention, and how they faced the documentation requirement (Section b in Table 1). Following this session, we refined the task formulation and further clarified some of the instructions.
During the time in which Group 1 performed the task, one of the five teams participating in the course solved the task in our presence. The aim of this observation was to ensure students worked on the task and understood it as intended.
After Group 1 completed its work on the task, we asked another pair of students, who did not participate in the course, to solve the task while observing their work. The aim of this observation was to check the general consistency in the performance of students, who are not course participants, with the work performed by the students in the course. Although no generalization can be reached based on this one additional team, it is interesting to note that their strategy and work process were consistent with our findings from the analysis of the work of Group 1 students.

The Questionnaire

The questionnaire was developed during the second stage of the research. Its aim was to examine the results obtained from the first stage (the task) from a different perspective. Forty-two Group 2 students completed the questionnaire, which is presented in Table 2.

Table 2. The Questionnaire.

Questionnaire UML

Name: ________________________

Occupation (in addition to computer science studies): ____________________

Experience in object-oriented development: ___________________________

The following table presents 9 types of UML diagrams. Please rank the contribution of the various diagrams to the software development process according to their importance (1 – highest contribution, 9 – lowest contribution, 0 – not familiar).

Type of Diagram	Rank
Use Case
Activity
Class
Sequence
Collaboration
State Chart
Object
Package
Deployment

Please re-rank the diagrams. This time refer to the contribution of the different types to the comprehension of a software system in whose development you did not take part, that is, to the comprehension of an unfamiliar software system.

Type of Diagram	Rank
Use Case
Activity
Class
Sequence
Collaboration
State Chart
Object
Package
Deployment

What are the reasons for the difference between the two rankings (if there is a difference)?

Thank you for your cooperation!

If you are willing to further contribute to this research, please specify the following details:

Tel.: ___________________________ e-mail: _________________________

3 DATA ANALYSIS

Based on an analysis of the students’ work and responses, we present three findings that refer to different stages of the comprehension process. One idea is common to these observations: Despite the small number of teams, relatively many different strategies were observed for each of the findings. In other words, no unique strategy can be identified in the different stages of the execution of the task. This finding clues about the nature of visual model comprehension.

Diagram sorting:

Although sorting of the diagrams was not part of the task, it is interesting to note that all teams sorted the diagrams according to some criterion before starting the actual examination of the diagrams. Specifically, five kinds of sorting were identified:
- By title (referred to by students as “the development phases”): This sorting process resulted in grouping all class diagrams together, all sequence diagrams together, and so on.
- By static vs. dynamic information: The students sorted the diagrams according to the kind of information they provide: static information (e.g., class diagrams) vs. dynamic information (e.g., sequence diagrams).
- By use cases: In this case, the diagrams were grouped according to the use cases presented in the use case diagram (cf. Appendix). Thus, all diagrams that contribute to the description of a specific use case were grouped together (e.g., all diagrams that might contribute to the description of the use case “delete a message” were grouped together).
- By objects: Students first identified the main entities of the system. Then, according to these entities they grouped the diagrams (e.g., all diagrams describing the “Message” object were grouped together).
- Informative vs. less-informative diagrams: Some diagrams were valued by students as having a major contribution to system comprehension (such as class diagrams), while other types of diagrams (notably collaboration diagrams), have, according to students’ opinions, only a minor contribution to system comprehension.
Looking at the second, third and fourth methods of categorization, it can be noted that the systems were examined from dynamic and/or static perspectives: The second categorization examines the diagrams from both perspectives; the third categorization examines the diagrams from a dynamic perspective; and the fourth examines the diagrams from a static perspective. As mentioned above, this shows that the different teams preferred different perspectives.
Pivotal diagrams:

This finding presents three diagrams that the students kept returning to in the process of reviewing the diagrams. In some way or another, these three diagrams present the system in an abstract manner. Thus, these diagrams provided the students with a global view of the system at times when they felt overwhelmed with details. Specifically, the three diagrams that served as pivotal diagrams were (as presented in the Appendix):
- Use case diagram
- Subsystems in sound recorder (a package diagram)
- MenuUserMode statechart
Once again, it can be observed that the students used both a static perspective (the package diagram) and a dynamic perspective (the use-case diagram and the statechart).
UML as a multifaceted perspective of a system:

This observation focuses on the diagrams reviewed by the students in order to retrieve relevant information. Table 3 presents the number of "visits" made by each team in each type of diagram. This table does not include the general diagrams (such as use case and deployment diagrams) and thus refers to only 23 of the 27 research diagrams.

Table 3. The number of "visits" to each type of diagram per team.

Diag. Type	Team #1	Team #2	Team #3	Team #4	Team #5	Total "visits"	No. of given diagrams
Class	5	10	8	7	18	48	8
Collaboration	2	2	10	3	3	20	3
Sequence	4	2	7	3	10	26	4
State	10	8	9	9	16	52	8
Total	21	22	34	22	47	146	23

Since there were a different number of diagrams in each diagram type (right-hand column), comparison of the absolute number of "visits" to each type of diagram would not be appropriate. Furthermore, since the total number of "visits" made by each team was different, a comparison of the number of "visits" made by the different teams to each type of diagram is not appropriate either. In order to be able to make these comparisons and reach reasonable conclusions, the data must first be normalized. Normalization was performed in the following manner (Formula 1): first, the number of "visits" per team per diagram type (each cell in the table) was divided by the total number of "visits" made by the team; second, the resulting fraction was divided by the ratio between the number of diagrams of the specific type and the total number of diagrams.

Formula 1: Data normalization.

For example, the normalization of the upper left-hand cell in Table 3, the number of "visits" made by Team #1 to the Class diagrams, is performed as follows: (5/21) / (8/23) = 0.68. Table 4 presents the outcome of this normalization.

Table 4. Normalized data per team.

Diag. Type	Team #1	Team #2	Team #3	Team #4	Team #5
Class	0.68	1.30	0.67	0.91	1.10
Collaboration	0.73	0.69	2.25	1.04	0.49
Sequence	1.09	0.52	1.18	0.78	1.22
State	1.36	1.04	0.76	1.18	0.97

Table 4 shows that, in fact, there is no one dominant type of diagram that all teams relate to more than others. The conclusion from this finding is that different students prefer different perspectives in the process of UML diagram comprehension. Interestingly, although different students preferred different types of diagrams while revealing the information, no significant differences were observed in the students’ descriptions of the system. This could imply that consistent information was reflected in the various diagrams and was revealed independent of the specific strategy adopted by the students (at least with respect to the specific set of diagrams presented to the students in this study).

Another interesting outcome, particularly with regard to the above, is the normalized total number of "visits" to each diagram (Table 5).

Table 5. Total no. of "visits" by all teams – normalized data.

Diag. Type	Total no. of "visits"	No. of diagrams	"Visits" per diagram (=Total "visits"/No. of diagrams)	"Visits" per diagram per team
Class	48	8	6.00	1.20
Collaboration	20	3	6.66	1.33
Sequence	26	4	6.50	1.30
State	52	8	6.50	1.30

From Table 5 we may conclude that, although each team focused on, and found different types of diagrams to be more useful than others, the total number of "visits" made by the five teams to the different types of diagrams is similar. In other words, although different personal preferences surfaced during the process of UML diagram comprehension, in an overall perspective, all of the diagrams were in fact equally important and useful.

These observations are supported by results obtained from the questionnaire (Table 6). As can be observed, the four diagram types were ranked in the same order in both parts of the questionnaire. More specifically, the order in which the four diagram types were ranked (from high to low importance) was: class, sequence, collaboration and finally, state. The differences, however, in the level of importance were smaller in the comprehension section of the questionnaire compared to the development section. Based on the answers to the open question at the end of the questionnaire about the perceived reasons for the differences between the two sections, our impression is that this reduction can be explained by the fact that the success of the comprehension phase is highly dependant on the integration of information retrieved from the different diagrams. At the same time, in the development phase, especially when performed by a single developer, there is less need to integrate information from different perspectives, hence the greater differences. It can thus be concluded that the multifaceted description provided by the UML diagrams not only contributes to software development, but that its contribution is increased in comprehension tasks.

Table 6. Questionnaire results – relative importance of the four diagrams (from high to low).

Diagram Type	Development phase	Comprehension phase
Class	2.76	3.19
Sequence	3.17	3.51
Collaboration	4.46	4.12
State	5.32	4.92

4 DISCUSSION

Our observations show that UML was utilized by the students as a multifaceted expression tool. The way in which the different teams sorted the diagrams in preparation for the comprehension process, the different pivotal diagrams that they leaned on, and the number of "visits" made to each of the different diagram types, all indicate that the process of comprehension and information extraction from UML diagrams varies between different people. It was also found that, when taken together, no one diagram type was globally less or more important than the others for the performance of the comprehension task. In other words, the differences in preference between the various teams canceled out each other.

The above conclusions are based on the work of senior computer science students. In the future, we intend to conduct a similar study on senior computer professionals. In parallel, research is being conducted by the authors on the construction of UML diagrams.

ACKNOWLEDGEMENTS

We would like to thank Professor Uri Leron from the Department of Education in Technology and Science for his comments on earlier versions of this paper and to Dr. Yossi Gil from the Department of Computer Science for his cooperation and good will.

REFERENCES

[Bass1999] Bassey, M., “Case study research in educational settings. Chapter 7: Methods of enquiry and the conduct of case study research”, UK: Open University Press, 1999.

[Bent1986] Bentley, J., “Programming pearls”, Communications of the ACM. 29(6) (1986), 26-28.

[Booc1999] Booch, G., “UML in Action”, Communications of the ACM. 42(10) (1999), 26-28.

[Broo1983] Brooks, R., “Towards a Theory of the Comprehension of Computer Programs”, International Journal of Man-Machine Studies. 18 (1983), 543-554.

[Davi1993] Davies, S. P., “Models and Theories of Programming Theory”, International Journal of Man-Machine Studies. 39 (1993), 237-267.

[Fjel1983] Fjeldstad, R. K. and Hamlen, W. T., “Application Program Maintenance Study – Reports to Our Respondents”, In Parikh, G. and Zvegintzov, N. (eds.), “Tutorial of Software Maintenance”, Silver Spring, MD: IEEE Computer Society Press, 1983.

[Fran2001] Francel, M. A. and Rugaber, S., “The Value of Slicing while Debugging”, Science of Computer Programming. 40 (2001), 151-169.

[Kobr1999] Kobryn, C., “A Standardization Odyssey”, Communications of the ACM. 42(10) (1999), 29-37.

[Knut1984] Knuth, D. E., “Literate Programming”, The Computer Journal. 27(2) (1984), 97-111.

[Leto1987] Letovsky, S., “Cognitive Processes in Program Comprehension”, The Journal of Systems and Software. 7 (1987), 325-339.

[Litt1987] Littman, D. C., Pinto, J., Letovsky, S. and Soloway, E., “Mental Models and Software Maintenance”, The Journal of Systems and Software. 7 (1987), 341-355.

[OMG1999] OMG Object Management Group, “UML Notation Guide”, Version 1.3, 1999.

[Palt1999] Paltor, I. P. and Lilius, J., “Digital Sound Recorder: A Case Study on Designing Embedded Systems Using the UML Notation”. (1999), URL: http://users.evitech.fi/~tk/rt_design/uml_sound_rec.pdf.

[Penn1987] Pennington, N., “Comprehension Strategies in Programming”. In Olson, G., Sheppard, S.B., Soloway, E. (eds.), Empirical Studies of Programmers, Albex, Norwood, N. J., 1987, 100-113.

[Schö1983] Schön, D. A., The Reflective Practitioner, BasicBooks, 1983.

[Solo1984] Soloway, E. and Ehrlich, K., “Empirical Studies of Programming Knowledge”, IEEE Trans. Software Engineering. 10(5) (1984), 595-609.

[Vans1999] Vans, A. M., von Mayrhauser, A. and Somlo, G., “Program Understanding Behavior During Corrective Maintenance of Large-Scale Software”, Int. Journal Human-Computer Studies. 51 (1999), 31-70.

About the authors

		Irit Hadar is a doctorate student in the Department of Education in Technology and Science of the Technion – Israel Institute of Technology. Both her Master and Bachelor degrees are from the Faculty of Industrial Engineering and Management of the Technion. Her doctorate thesis examines how software development experts understand concepts and principles in Object Oriented Design. UML is used in her research as an expression tool. She can be reached at hadari@techunix.technion.ac.il.
		Orit Hazzan is a senior lecturer in the Department of Education in Technology and Science of the Technion - Israel Institute of Technology. Her research focuses on teaching human aspects of software engineering. Specifically, she examines the application of the studio teaching method into software engineering education and the teaching of software development methodologies in undergraduate curricula. She can be reached at oritha@techunix.technion.ac.il.

Appendix: Three Pivotal Diagrams

Source: Paltor and Lilius, 1999.

Fig 1. Use Case Diagram used in the task

Fig 2. Package Diagram used in the task

Fig 3. Statechart describing the user menu used in the task

Cite this article as follows: Irit Hadar, Orit Hazzan: “On the Contribution of UML Diagrams to Software System Comprehension”, in Journal of Object Technology, vol. 3, no. 1, January-February 2004, pp. 143-156. http://www.jot.fm/issues/issue_2004_01/article3