XCorpus – An executable Corpus of Java Programs

By: Jens Dietrich, Henrik Schole, Li Sui, Ewan Tempero


Empirical studies on code require standardized datasets of significant size extracted from real-world programs in order to be reproducible and generalisable. We argue that there is a need for such data sets that are executable and can therefore be used for experiments using static and dynamic analysis. A harness for such a data set should have high coverage in order to facilitate the construction of comprehensive models of program execution. We present XCorpus, a set of 76 executable, real-world Java programs, including a subset of 70 programs from the Qualitas Corpus. XCorpus uses a harness that is a combination of built-in and generated test cases, resulting in a branch coverage that is significantly better than what is available from DaCapo.


data set, benchmark, Java, empirical study, program analysis, test case generation, test coverage, dynamic program analysis

Cite as:

Jens Dietrich, Henrik Schole, Li Sui, Ewan Tempero, “XCorpus – An executable Corpus of Java Programs”, Journal of Object Technology, Volume 16, no. 4 (August 2017), pp. 1:1-24, doi:10.5381/jot.2017.16.4.a1.

PDF | DOI | BiBTeX | Tweet this | Post to CiteULike | Share on LinkedIn

The JOT Journal   |   ISSN 1660-1769   |   DOI 10.5381/jot   |   AITO   |   Open Access   |    Contact