XCorpus – An executable Corpus of Java Programs

By: Jens Dietrich, Henrik Schole, Li Sui, Ewan Tempero

Abstract

Empirical studies on code require standardized datasets of significant size extracted from real-world programs in order to be reproducible and generalisable. We argue that there is a need for such data sets that are executable and can therefore be used for experiments using static and dynamic analysis. A harness for such a data set should have high coverage in order to facilitate the construction of comprehensive models of program execution. We present XCorpus, a set of 76 executable, real-world Java programs, including a subset of 70 programs from the Qualitas Corpus. XCorpus uses a harness that is a combination of built-in and generated test cases, resulting in a branch coverage that is significantly better than what is available from DaCapo.

Keywords

data set, benchmark, Java, empirical study, program analysis, test case generation, test coverage, dynamic program analysis

Cite as:

Jens Dietrich, Henrik Schole, Li Sui, Ewan Tempero, “XCorpus – An executable Corpus of Java Programs”, Journal of Object Technology, Volume 16, no. 4 (August 2017), pp. 1:1-24, doi:10.5381/jot.2017.16.4.a1.

PDF | DOI | BiBTeX | Tweet this | Post to CiteULike | Share on LinkedIn

The JOT Journal   |   ISSN 1660-1769   |   DOI 10.5381/jot   |   AITO   |   Open Access   |    Contact